IWETHEY v. 0.3.0 | TODO
1,095 registered users | 0 active users | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New I have never had an under spec'd system.
Like box, Hardware guys used to seeing projects from Software guys that have *THIS* much processing requirements is sort of a crap shoot.

I usually ask the developers/project leads what they believe the project load for hard ware is. I then note this... then go find similar project implementations, ask questions about them. Get answers from the people involved in it...

Then pull an answer from /dev/ass add anywhere from 10%-50% (depending on the application language and average programming skill of "our developers" in the language).

This leads to a heated argument about the machine costs. I then make a compromise of "projected" costs from others, with the contingency of my costs in reserve. Nearly^W e^HEvery time, my reserve is allocated for the project.

Usually what end up happening, is most groups (at the organization I work for) come up with a price, then ask me to come up with a price and compare. Usually they spend my amount and things work very well... with a bit of breathing room.

Case in point, I had BEA Application server, rather than use large Java programs with every thing in them, they used thousands of small programs. This machine was running at a Load average of 200-300, but with sub 1 sec response time. Everyone was freaking out about the high load average. I was watching the machine closely, Idle time was ~70%, memory usage was ~50%, disk I/O wait time was ~1%, swapping was not there period.

Turns out the BEA app server was executing nearly a thousand of these small proggys a second. With room to spare. We ended up running 3 of these appservers on a machine that had 4 processors (these were p-Series AIX machines). We made each appserver have processor affinity and allocated 1/4 of the memory to each application server. There were times at during initial student registration this one machine would have Load Average of 500, they added 3 more machines setup this way after I left. They never did really need them, re-allocated 2 to front-end serving and left 2 machines to do the middle tier of the 3 tier application.

I don't know how things are doing now, as the whole IT Department was cut-up and re-distributed among other departments.
--
[link|mailto:greg@gregfolkert.net|greg],
[link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey
Freedom is not FREE.
Yeah, but 10s of Trillions of US Dollars?
SELECT * FROM scog WHERE ethics > 0;

0 rows returned.
New More to the point is will it fit on one machine?
Sample problem that illustrates the point.

You have to write a web service that takes a US social security number. The service returns true if the number is assigned to a living US citizen, false otherwise. You receive an export file from the federal government every month enumerating all the valid ssns.

Solve this problem.

Key questions - how efficient is your lookup in terms of n (where n is number of SSNs in data set)?
How much space does it take to store one SSN? In what format?
What is the worst case memory utilization (100% SSNs in use)?
Can you fit it on one garden variety 32 bit machine or will you need to partition the data set?

This is one of my fave interview questions because it is not at all clear whether you can fit the data set into RAM or not without doing some actual figuring. The lookup algo gives people fits as well.



[link|http://www.blackbagops.net|Black Bag Operations Log]

[link|http://www.objectiveclips.com|Artificial Intelligence]

[link|http://www.badpage.info/seaside/html|Scrutinizer]
New I guess I think to simply.
a single cgi-bin script can easily do the lookup using grep with or without Perl.
The data can be stored 9 digits plus linefeed in a straight text file
Why store in memory. But you could if you wanted to, in about 1GB.

Data shouldn't be more than 1GB or so.

If the government supplies that data, the thing that could take the most time is the conversion from what they provide to what you need. If you are only validating the info... why would you need more than a CGI shell script?

I guess I look at things to simply.
--
[link|mailto:greg@gregfolkert.net|greg],
[link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey
Freedom is not FREE.
Yeah, but 10s of Trillions of US Dollars?
SELECT * FROM scog WHERE ethics > 0;

0 rows returned.
New You would filewalk for every query?
What is the worst case for a single transaction?

If you had 500 hit within 2 seconds, how many will you be able to satisfy before a person gets annoyed and presses cancel?

How about 5,000? Keep going.


Basically, simple may be OK for a single activity, but it rarely scales, at least cost-effectively.
New RAMFS?
--
[link|mailto:greg@gregfolkert.net|greg],
[link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey
Freedom is not FREE.
Yeah, but 10s of Trillions of US Dollars?
SELECT * FROM scog WHERE ethics > 0;

0 rows returned.
New Doesn't matter
You've got to go through each bottleneck during an access and then see where people start piling up.

How fast can a program read a file, even if it is pure ram disk.

Hmm - I seem to recall moving data at 300MB per second when walking through a fully cached 1GB file on an Opteron - so let's say if you average out finding stuff in 1/2 the time, ie: 1.5 seconds for that piece of the puzzle.

So, you can server 6 or 7 queries in about 10 seconds, which is what I consider an absolute, worst case, will not wait any longer until I hit the STOP button.

But it is really a waste of activity anyway.

So then your choice is to start working through access and storage. It is preferable if you can code a single program to load the data in memory, and this program has very quick access to the data, and then this program can publish a server interface (simple tcp/ip socket, etc), which other programs can use.

And then you think in terms of memory usage. You don't really need to actually store the data if you can come up with a representation of existence. You'd rather use less resources if possible, as long as it doesn't slow down your access.

I seem to recall this particular task lends itself to bitmapped scoreboarding. Also, which a bit of knowledge of the data type, I seem to recall only certain ranges of numbers are used.

So when you end up compressing what you store, and add a bit of smarts to the search method, you go from serving a couple of dozen query a minute to a couple of thousand a second. At that point you find your networking socket creation becomes the bottleneck, rather than the actual query, so then you might move tho optimizing that. Or you may find a couple of cheap load balanced machines takes care of your complete universe of possible usage, so you stop.
New That would be a radix-search, I think.
Ideal for numeric data like SSNs.
"Don't give up!"
New Nah, you want a hash
You always want a hash. O(1) is really hard to beat.



[link|http://www.blackbagops.net|Black Bag Operations Log]

[link|http://www.objectiveclips.com|Artificial Intelligence]

[link|http://www.badpage.info/seaside/html|Scrutinizer]
New I thought a radix search was also O(1).
Perhaps I'm misremembering, but a radix-search is very similar to a hash lookup, but uses the value itself directly rather than hashing it first. I think in practice it's a little more complicated than that, but that's the idea.

Wade.
"Don't give up!"
New Gah - my bad
I was thinking of radix sort - which is O(nk) where k is key size.

Search is, as you say, proportional to k or O(1).



[link|http://www.blackbagops.net|Black Bag Operations Log]

[link|http://www.objectiveclips.com|Artificial Intelligence]

[link|http://www.badpage.info/seaside/html|Scrutinizer]
New Also, I just thought about your file size
It is a LOT worse than what you thought if you actually have each number, so our initial query time for a filewalk just jumped tremendously.
New On a 32 bit machine?



[link|http://www.blackbagops.net|Black Bag Operations Log]

[link|http://www.objectiveclips.com|Artificial Intelligence]

[link|http://www.badpage.info/seaside/html|Scrutinizer]
New Ever heard of PAE?
It has been a part of Intel 32-bit for a LONG time.
--
[link|mailto:greg@gregfolkert.net|greg],
[link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey
Freedom is not FREE.
Yeah, but 10s of Trillions of US Dollars?
SELECT * FROM scog WHERE ethics > 0;

0 rows returned.
New I'd call that exotic technology



[link|http://www.blackbagops.net|Black Bag Operations Log]

[link|http://www.objectiveclips.com|Artificial Intelligence]

[link|http://www.badpage.info/seaside/html|Scrutinizer]
New No, it is bone stock on everything since the P3 from Intel
The most support Chipset supports PAE, except most MB manufacturers didn't enable it.
--
[link|mailto:greg@gregfolkert.net|greg],
[link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey
Freedom is not FREE.
Yeah, but 10s of Trillions of US Dollars?
SELECT * FROM scog WHERE ethics > 0;

0 rows returned.
New And AMD? PowerPC? ARM?
>most MB manufacturers didn't enable it.

So it would be a bad idea to count on it I think.

There's a lot more reasons I'd discount this solution. For one - the problem says a 32 bit machine. It doesn't say a 32 bit Intel machine. It doesn't say a 36 bit machine - which is essentially what PAE gives you if its enabled. It says - a vanilla 32 bit machine.

I'm not saying PAE is bad. I'm not saying it isn't widely available. I'm saying it is outside of the problem definition.



[link|http://www.blackbagops.net|Black Bag Operations Log]

[link|http://www.objectiveclips.com|Artificial Intelligence]

[link|http://www.badpage.info/seaside/html|Scrutinizer]
New You said COMMON 32-bit
Power,ARM even AMD are also outside the problems definition.


Edit:
Excuse me, "garden variety 32 bit"

Garden Variety... Doesn't mean exotics like Power, ARM, MIPS, MIPS-EL, SH, s390 or even AMD.
--
[link|mailto:greg@gregfolkert.net|greg],
[link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey
Freedom is not FREE.
Yeah, but 10s of Trillions of US Dollars?
SELECT * FROM scog WHERE ethics > 0;

0 rows returned.
Expand Edited by folkert Nov. 27, 2006, 02:35:31 PM EST
New That kind of weasel won't get you a job
And I said 'garden variety'



[link|http://www.blackbagops.net|Black Bag Operations Log]

[link|http://www.objectiveclips.com|Artificial Intelligence]

[link|http://www.badpage.info/seaside/html|Scrutinizer]
Expand Edited by tuberculosis Nov. 27, 2006, 02:35:03 PM EST
New This line of questioning wouldn't get you a candidate for
the job in the first place. You knocked out many worth while one even before you stop to consider you rant.
--
[link|mailto:greg@gregfolkert.net|greg],
[link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey
Freedom is not FREE.
Yeah, but 10s of Trillions of US Dollars?
SELECT * FROM scog WHERE ethics > 0;

0 rows returned.
New Whoa
The key issue was guiding you toward a reasonable doable solution, at least to get you thinking along those lines. But since you really don't have the programming background, you attempted a brute force response (which is typical for non-programmers) and then got upset when he refocused.

There is absolutely NO WAY your anticipated solution would have been workable, even with all the memory available in PAE. Didn't matter. So he pointed you back to the simple original requirements.

But rather than take the direction, you got silly.
New No he got downright rude.
--
[link|mailto:greg@gregfolkert.net|greg],
[link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey
Freedom is not FREE.
Yeah, but 10s of Trillions of US Dollars?
SELECT * FROM scog WHERE ethics > 0;

0 rows returned.
New I'm not trying to be rude
I'm trying to help you. When I say PAE is out of bounds - accept it and try a new avenue. Don't spend time defending it as a choice.

Your interviewer has specific things he wants to see.

This is no different than asking you to reverse a string, watching you allocate space for a new one, and then me refocusing to say "OK - but the string is ginormous and there is no more memory available - now what will you do". Because I want to see the in-place solution.

You said you want to be a programmer. I'm happy to help you achieve that and coach you through what passes as interviewing these days. The really-hard-puzzle-solve-this-on-the-spot interview is very popular now.




[link|http://www.blackbagops.net|Black Bag Operations Log]

[link|http://www.objectiveclips.com|Artificial Intelligence]

[link|http://www.badpage.info/seaside/html|Scrutinizer]
New Where did you get 1G?
SSNs are 9 digit numbers. There are a billion possible. You need 10 bytes to store one based on the scheme you mentioned. So you need 10G worst case to store the data.

What is the efficiency of your algorithm. IOW, as n increases, what happens to search time t?

How fast can you grep a 10G file? If its more than a few milliseconds it is too slow.

Having and being able to explain these numbers will get you a second interview. Not having them will get you a 'thanks for your time'. Its not even remotely academic in that environment. You have to do this kind of stuff with all kinds of data. Session ids, customer ids, sku codes, etc.

For fun, generate the file of all possible numbers, then grep 500000000. That's your average. Let me know how long it takes.



[link|http://www.blackbagops.net|Black Bag Operations Log]

[link|http://www.objectiveclips.com|Artificial Intelligence]

[link|http://www.badpage.info/seaside/html|Scrutinizer]
Expand Edited by tuberculosis Nov. 26, 2006, 10:50:05 PM EST
New Not all 9 digit numbers are valid.
As I'm sure you know.

[link|http://en.wikipedia.org/wiki/Social_Security_number|Wikipedia]

Valid SSNs

Currently, a valid SSN cannot have an area number above 772, the highest area number which the Social Security Administration has allocated.[3]

There are also special numbers which will never be allocated:

* Numbers with all zeros in any digit group (000-xx-xxxx, xxx-00-xxxx, xxx-xx-0000).
* Numbers of the form 666-xx-xxxx, probably due to the potential controversy (see Number of the Beast). Though the omission of this area number is not acknowledged by the SSA, it remains unassigned.
* Numbers from 987-65-4320 to 987-65-4329 are reserved for use in advertisements [citation needed].

The Administration publishes the last group number used for each area number. Since group numbers are allocated in a regular (if unusual) pattern, it is possible to identify an unissued SSN that contains an invalid group number. Despite these measures, many fraudulent SSNs cannot easily be detected using only publicly available information.


Thus, at the moment, there are a little less than 772M possible valid SS numbers. Hey, a 22% savings is nothing to sneeze at! :-)

Cheers,
Scott.
(Who would probably answer a question like this by saying: "I'm sure this is a well-known problem that has been solved many times over the last ~ 70 years. Give me a few minutes with Google and I'll get back to you." - because he's not a programmer.)
New I assumed so
but I didn't know what the rules were.

There is also, the obvious 50% optimization. If more than half the numbers are allocated, I only track those that are not. Otherwise I track those that are. Thus my worst case scenario moves to 50% allocated and my memory requirements are potentially cut by half.




[link|http://www.blackbagops.net|Black Bag Operations Log]

[link|http://www.objectiveclips.com|Artificial Intelligence]

[link|http://www.badpage.info/seaside/html|Scrutinizer]
New something you forgot, what lawyer to hire
since as a private company that usage is clearly illegal.
thanx,
bill
Any opinions expressed by me are mine alone, posted from my home computer, on my own time as a free american and do not reflect the opinions of any person or company that I have had professional relations with in the past 50 years. meep
New Nah
As long as it is not tied to any identifying information, using it to merely check the validity of a number entered in a credit card application (when the person already OKed it) would be fine.
New We talked about this one before
[link|http://z.iwethey.org/forums/render/content/show?contentid=183544|http://z.iwethey.org...?contentid=183544] :-)
Regards,

-scott anderson

"Welcome to Rivendell, Mr. Anderson..."
New We did - he didn't
Anyhow, I like this problem because the data set size is really close to the size of available ram and you can't tell by eyeballing it if you can make it fit or not. You have to think it through. Even then, the usual representations still don't quite fit. So you have to think harder. Which is the whole point.



[link|http://www.blackbagops.net|Black Bag Operations Log]

[link|http://www.objectiveclips.com|Artificial Intelligence]

[link|http://www.badpage.info/seaside/html|Scrutinizer]
Expand Edited by tuberculosis Nov. 26, 2006, 11:06:33 PM EST
New Just saying...
There's no need to repeat all of the stuff that was said before.
Regards,

-scott anderson

"Welcome to Rivendell, Mr. Anderson..."
New But we hardly talk about programming at all anymore
Did everyone move into management?



[link|http://www.blackbagops.net|Black Bag Operations Log]

[link|http://www.objectiveclips.com|Artificial Intelligence]

[link|http://www.badpage.info/seaside/html|Scrutinizer]
New :-)
I agree that having more programming discussions would be wonderful. Trouble is, we seem to have scared away many of the passionate programmers (BT, TSE, etc.). Even CRC doesn't post here as much as he used to. It's not fair of us to have you carry all the weight.

Admin's musings about "articles" would make discussions about programming have a longer life, I think, since they could be made sticky to get more eyeball time.

We need more eyeballs here...

Cheers,
Scott.
New I agree on the eyeballs.
I've posted at least two questions I wanted some opinions on and the discussion on one has been non-existent and the other has been limited, but interesting.

Wade.
"Don't give up!"
New I put most of my programming rants on my blog
For just the reason you cite. Blog is better bucket for articles. Stevey isn't the only one with drunken blog rants.

I suppose I could copy them here when I put up new ones.



[link|http://www.blackbagops.net|Black Bag Operations Log]

[link|http://www.objectiveclips.com|Artificial Intelligence]

[link|http://www.badpage.info/seaside/html|Scrutinizer]
New non programmer answer, append a 0,1 or2
to the soc giving you a 10 bit answer, the one is 0 if living 1 if dead 2 if green card or other legal method but not citizenship. The hardpart would be the monthy updates from the fed, doing a diff on the prior month and apply the changes would be the fastest method. SS numbers run from 001-574 so you are looking at a single 32 bit box for the web server. I would use PICK because it has screamingly fast native hashes as well as a web front end.
thanx,
bill
Any opinions expressed by me are mine alone, posted from my home computer, on my own time as a free american and do not reflect the opinions of any person or company that I have had professional relations with in the past 50 years. meep
New PICK?!?!?!?!?!?!?!?!?!?!?
WTF is wrong with you!
PICK!?
New dont knock it until you have tried it, recently
Any opinions expressed by me are mine alone, posted from my home computer, on my own time as a free american and do not reflect the opinions of any person or company that I have had professional relations with in the past 50 years. meep
New I've spent the last 20 years systematically killing it
I've had multiple contracts to reverse engineer and either recode or pull the data and dictionaries out and transfer to a vertical app on another box.

Feel free to point me to a "modern" Pick that deserves to live. OpenInsight? Only if it is too expensive to port to something else.

Pick sucks. Real pick, pick on dos, pick on windows, pick on Unix, pick sucks!

Dick Pick made his living selling shit to people who did not know any better, slapping shit together with no theoretical foundation. The only reason it has survived this long is vertical apps are locked into it and it is too painful to get it out.
New again, see what the needs are and then ask what the solution
runs on. PICK is a screamingly fast hash on flat files. Since I would have little desire to build my own I would buy (and cheaper for the client than my hourly to create) a hash. Just because people get locked into vertical solutions doesnt mean the underlying technology is unsound. Yes old stuff is hard to maintain and cheaper to move from the environment, that isnt the question on the table.
thanx,
bill
Any opinions expressed by me are mine alone, posted from my home computer, on my own time as a free american and do not reflect the opinions of any person or company that I have had professional relations with in the past 50 years. meep
New One trick pony
"Screamingly fast" hash access on one hand, horribly programming / operational environment on the other. Moder databases (free ones) can be just as fast as well - I have no faith in ANY Pick implementation. The goal is to lock you in so they milk you forever.
New If that is the trick you need use it
Any opinions expressed by me are mine alone, posted from my home computer, on my own time as a free american and do not reflect the opinions of any person or company that I have had professional relations with in the past 50 years. meep
New The key is to stay off the disk
It takes probably 1000 times longer to hit a disk than to hit RAM. Perhaps more.

Make it fit into memory.



[link|http://www.blackbagops.net|Black Bag Operations Log]

[link|http://www.objectiveclips.com|Artificial Intelligence]

[link|http://www.badpage.info/seaside/html|Scrutinizer]
New CramFS in RAM.
--
[link|mailto:greg@gregfolkert.net|greg],
[link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey
Freedom is not FREE.
Yeah, but 10s of Trillions of US Dollars?
SELECT * FROM scog WHERE ethics > 0;

0 rows returned.
New That's not a solution either
at best, its a piece of one, maybe.



[link|http://www.blackbagops.net|Black Bag Operations Log]

[link|http://www.objectiveclips.com|Artificial Intelligence]

[link|http://www.badpage.info/seaside/html|Scrutinizer]
New Okay, at this point...
You have to realize, I am currently an Implementor. A programmers asks for something, I give it to them or tell them what I need to be able to give them said thing.

I have not even started to delve into the underlying things that you are assuming I know.

I know systems design, not programming design. I know enough to know I don't know enough.

I need to get to the work on learning those pieces I need to know to program.

You should be able to see that by now.
--
[link|mailto:greg@gregfolkert.net|greg],
[link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey
Freedom is not FREE.
Yeah, but 10s of Trillions of US Dollars?
SELECT * FROM scog WHERE ethics > 0;

0 rows returned.
New Sure - just want to give you a taste
of what you're getting yourself into. And FWIW, programming is kind of hard to get into and it took me a long time to get there beating my head against the wall. Heck, I started in phone support and got tired of taking the heat for bonehead moves I didn't make. So I started to read and explore programming tools.

To be a software developer you need

1) A project. Something you want to write. Doing exercises only takes you so far.
2) A language. Any language, really. You should be able to write basic programs in some language. Understand ifs, loops, basic arrays.
3) Some theory.

Allow me to point you to some lite theory reading:

[link|http://www.amazon.com/Algorithms-Data-Structures-Electrical-Engineering/dp/1584502509/sr=1-7/qid=1164685832/ref=sr_1_7/103-9975159-5352656?ie=UTF8&s=books|This one] has good reviews and seems to cover the right stuff. I haven't read it in person.

[link|http://www.amazon.com/Introduction-Algorithms-Second-Thomas-Cormen/dp/0262032937/ref=cm_sylt_fullview_prod_txt_22/103-9975159-5352656/103-9975159-5352656|Cormen] is the bible in this area. But it is large, pricey, an not so approachable I think. Great reference handbook for the practicing professional though.

I rather like Sedgewick's books - he has editions for several different languages - C, C++, Pascal, Java... His exposition is good, but his coding is sloppy, there are many bugs in his examples.

I used to teach algorithm analysis from a book by [link|http://www.amazon.com/Structures-Algorithms-Addison-Wesley-Computer-Information/dp/0201000237/sr=1-1/qid=1164686185/ref=sr_1_1/103-9975159-5352656?ie=UTF8&s=books|Aho] that was pretty good.

These all cover mostly the same material.

I also used to teach the C and C++ classes at UCDenver and have a bunch of toy programming assignments if you want anything to practice on.





[link|http://www.blackbagops.net|Black Bag Operations Log]

[link|http://www.objectiveclips.com|Artificial Intelligence]

[link|http://www.badpage.info/seaside/html|Scrutinizer]
New personally I hate programming, I like debugging
oldies but goodies
Distributed Micro/Minicomputer Systems by Ca Weitzman
Unix Programming Manual Version 1 by Bell Labs
the C programming Language by Kernihan and Ritchie
Any O'Reilley book on modern scripting languages
Learning Java in 21 days
Mythical Man Month
thanx,
bill
Any opinions expressed by me are mine alone, posted from my home computer, on my own time as a free american and do not reflect the opinions of any person or company that I have had professional relations with in the past 50 years. meep
New Cheaper at Bookpool, btw.
[link|http://www.bookpool.com/ss?qs=cormen&x=0&y=0|http://www.bookpool....qs=cormen&x=0&y=0]

I never have book recommendations, because I rarely read books. Usually if I get a book, it's on a specific subject, like "Concurrent Programming In Java" or the like.
Regards,

-scott anderson

"Welcome to Rivendell, Mr. Anderson..."
New ICLRPD
I haven't read it in person.
===

Kip Hawley is still an idiot.

===

Purveyor of Doc Hope's [link|http://DocHope.com|fresh-baked dog biscuits and pet treats].
[link|http://DocHope.com|http://DocHope.com]
New Math isnt my strong suit but
5749999990-1010000000=4739999990*10(ten being the number of digits in each number)=47399999900/3/1024/1024=15068 megabytes. Trouble with 32 bit apps you can only address 2 gigs memory so lets work with a 1.5 gig addressable beyond the application overhead. As you can see only about 10% can inhabit active memory, that isnt too bad a cache percentage. Optimizing the cache itsef would be the trick and as an off the shelf kind of guy I would use someone elses, not build my own.
thanx,
bill
Any opinions expressed by me are mine alone, posted from my home computer, on my own time as a free american and do not reflect the opinions of any person or company that I have had professional relations with in the past 50 years. meep
New I hope you're going to outline the solution. :-)
Google is failing me, as I'm not using the right terms or there are common terms that are overlapping.

I did come across [link|http://www.google.com/url?sa=t&ct=res&cd=1&url=http%3A%2F%2Fresearch.microsoft.com%2Fbarc%2FSortBenchmark%2F2005_SCS_Wyllie.pdf&ei=s55rRd_qOoP4wQKU_fXsCg&usg=__KVTl6eKX9PSAv6WEIfrhOc4-1mw=&sig2=JWT74X4KjUPj2bhxbg31LQ|this] (7 page .pdf) paper by Wyllie of IBM Almaden on sorting a 1 TB file in 437 seconds that was pretty interesing.

Cheers,
Scott.
New Admin did it very nicely.
[link|http://z.iwethey.org/forums/render/content/show?contentid=183544|http://z.iwethey.org...?contentid=183544]
New D'Oh! Thanks.
New Cool sort paper
Largest file I ever sorted was about 80GB.
New Taking a stab
Convert the SSN's to 32 bit numbers, so each ssn takes 4 bytes. Pack them in an array (so there's no added overhead per entry). Simple search algorithm would be to start searching in the middle, and dividing each search in half from there.

Search efficiency: log2(n) [10^9 = 2^30]
Storage space per entry: 4 bytes
Worst case n: 10^9
Worst case space = 4 x 10^9 = 4,000,000,000 = 4gigs
New Still doesn't beat O(1) :-)
Regards,

-scott anderson

"Welcome to Rivendell, Mr. Anderson..."
New Ok, I read the answer you gave...
...makes sense for the constraints given in the question. :-)
     Steve Yegge (who he? - ed) on Perl - (pwhysall) - (62)
         IRLRPD. (new thread) - (Another Scott)
         He worked for Amazon for many years - (tuberculosis) - (60)
             Well based on that ... - (folkert) - (59)
                 Key factors are - (tuberculosis) - (58)
                     do it all the time - (boxley)
                     I have never had an under spec'd system. - (folkert) - (56)
                         More to the point is will it fit on one machine? - (tuberculosis) - (55)
                             I guess I think to simply. - (folkert) - (22)
                                 You would filewalk for every query? - (crazy) - (18)
                                     RAMFS? -NT - (folkert) - (17)
                                         Doesn't matter - (crazy) - (5)
                                             That would be a radix-search, I think. - (static) - (3)
                                                 Nah, you want a hash - (tuberculosis) - (2)
                                                     I thought a radix search was also O(1). - (static) - (1)
                                                         Gah - my bad - (tuberculosis)
                                             Also, I just thought about your file size - (crazy)
                                         On a 32 bit machine? -NT - (tuberculosis) - (10)
                                             Ever heard of PAE? - (folkert) - (9)
                                                 I'd call that exotic technology -NT - (tuberculosis) - (8)
                                                     No, it is bone stock on everything since the P3 from Intel - (folkert) - (7)
                                                         And AMD? PowerPC? ARM? - (tuberculosis) - (6)
                                                             You said COMMON 32-bit - (folkert) - (5)
                                                                 That kind of weasel won't get you a job - (tuberculosis) - (4)
                                                                     This line of questioning wouldn't get you a candidate for - (folkert) - (3)
                                                                         Whoa - (crazy) - (2)
                                                                             No he got downright rude. -NT - (folkert) - (1)
                                                                                 I'm not trying to be rude - (tuberculosis)
                                 Where did you get 1G? - (tuberculosis) - (2)
                                     Not all 9 digit numbers are valid. - (Another Scott) - (1)
                                         I assumed so - (tuberculosis)
                             something you forgot, what lawyer to hire - (boxley) - (1)
                                 Nah - (crazy)
                             We talked about this one before - (admin) - (6)
                                 We did - he didn't - (tuberculosis) - (5)
                                     Just saying... - (admin) - (4)
                                         But we hardly talk about programming at all anymore - (tuberculosis) - (3)
                                             :-) - (Another Scott) - (2)
                                                 I agree on the eyeballs. - (static)
                                                 I put most of my programming rants on my blog - (tuberculosis)
                             non programmer answer, append a 0,1 or2 - (boxley) - (15)
                                 PICK?!?!?!?!?!?!?!?!?!?!? - (crazy) - (5)
                                     dont knock it until you have tried it, recently -NT - (boxley) - (4)
                                         I've spent the last 20 years systematically killing it - (crazy) - (3)
                                             again, see what the needs are and then ask what the solution - (boxley) - (2)
                                                 One trick pony - (crazy) - (1)
                                                     If that is the trick you need use it -NT - (boxley)
                                 The key is to stay off the disk - (tuberculosis) - (8)
                                     CramFS in RAM. -NT - (folkert) - (6)
                                         That's not a solution either - (tuberculosis) - (5)
                                             Okay, at this point... - (folkert) - (4)
                                                 Sure - just want to give you a taste - (tuberculosis) - (3)
                                                     personally I hate programming, I like debugging - (boxley)
                                                     Cheaper at Bookpool, btw. - (admin)
                                                     ICLRPD - (drewk)
                                     Math isnt my strong suit but - (boxley)
                             I hope you're going to outline the solution. :-) - (Another Scott) - (3)
                                 Admin did it very nicely. - (crazy) - (1)
                                     D'Oh! Thanks. -NT - (Another Scott)
                                 Cool sort paper - (crazy)
                             Taking a stab - (ChrisR) - (2)
                                 Still doesn't beat O(1) :-) -NT - (admin) - (1)
                                     Ok, I read the answer you gave... - (ChrisR)

Then again you wonder how certain people sign their names to anything and still choose to leave the house.
304 ms