IWETHEY v. 0.3.0 | TODO
1,095 registered users | 1 active user | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New Doesn't matter
You've got to go through each bottleneck during an access and then see where people start piling up.

How fast can a program read a file, even if it is pure ram disk.

Hmm - I seem to recall moving data at 300MB per second when walking through a fully cached 1GB file on an Opteron - so let's say if you average out finding stuff in 1/2 the time, ie: 1.5 seconds for that piece of the puzzle.

So, you can server 6 or 7 queries in about 10 seconds, which is what I consider an absolute, worst case, will not wait any longer until I hit the STOP button.

But it is really a waste of activity anyway.

So then your choice is to start working through access and storage. It is preferable if you can code a single program to load the data in memory, and this program has very quick access to the data, and then this program can publish a server interface (simple tcp/ip socket, etc), which other programs can use.

And then you think in terms of memory usage. You don't really need to actually store the data if you can come up with a representation of existence. You'd rather use less resources if possible, as long as it doesn't slow down your access.

I seem to recall this particular task lends itself to bitmapped scoreboarding. Also, which a bit of knowledge of the data type, I seem to recall only certain ranges of numbers are used.

So when you end up compressing what you store, and add a bit of smarts to the search method, you go from serving a couple of dozen query a minute to a couple of thousand a second. At that point you find your networking socket creation becomes the bottleneck, rather than the actual query, so then you might move tho optimizing that. Or you may find a couple of cheap load balanced machines takes care of your complete universe of possible usage, so you stop.
New That would be a radix-search, I think.
Ideal for numeric data like SSNs.
"Don't give up!"
New Nah, you want a hash
You always want a hash. O(1) is really hard to beat.



[link|http://www.blackbagops.net|Black Bag Operations Log]

[link|http://www.objectiveclips.com|Artificial Intelligence]

[link|http://www.badpage.info/seaside/html|Scrutinizer]
New I thought a radix search was also O(1).
Perhaps I'm misremembering, but a radix-search is very similar to a hash lookup, but uses the value itself directly rather than hashing it first. I think in practice it's a little more complicated than that, but that's the idea.

Wade.
"Don't give up!"
New Gah - my bad
I was thinking of radix sort - which is O(nk) where k is key size.

Search is, as you say, proportional to k or O(1).



[link|http://www.blackbagops.net|Black Bag Operations Log]

[link|http://www.objectiveclips.com|Artificial Intelligence]

[link|http://www.badpage.info/seaside/html|Scrutinizer]
New Also, I just thought about your file size
It is a LOT worse than what you thought if you actually have each number, so our initial query time for a filewalk just jumped tremendously.
     Steve Yegge (who he? - ed) on Perl - (pwhysall) - (62)
         IRLRPD. (new thread) - (Another Scott)
         He worked for Amazon for many years - (tuberculosis) - (60)
             Well based on that ... - (folkert) - (59)
                 Key factors are - (tuberculosis) - (58)
                     do it all the time - (boxley)
                     I have never had an under spec'd system. - (folkert) - (56)
                         More to the point is will it fit on one machine? - (tuberculosis) - (55)
                             I guess I think to simply. - (folkert) - (22)
                                 You would filewalk for every query? - (crazy) - (18)
                                     RAMFS? -NT - (folkert) - (17)
                                         Doesn't matter - (crazy) - (5)
                                             That would be a radix-search, I think. - (static) - (3)
                                                 Nah, you want a hash - (tuberculosis) - (2)
                                                     I thought a radix search was also O(1). - (static) - (1)
                                                         Gah - my bad - (tuberculosis)
                                             Also, I just thought about your file size - (crazy)
                                         On a 32 bit machine? -NT - (tuberculosis) - (10)
                                             Ever heard of PAE? - (folkert) - (9)
                                                 I'd call that exotic technology -NT - (tuberculosis) - (8)
                                                     No, it is bone stock on everything since the P3 from Intel - (folkert) - (7)
                                                         And AMD? PowerPC? ARM? - (tuberculosis) - (6)
                                                             You said COMMON 32-bit - (folkert) - (5)
                                                                 That kind of weasel won't get you a job - (tuberculosis) - (4)
                                                                     This line of questioning wouldn't get you a candidate for - (folkert) - (3)
                                                                         Whoa - (crazy) - (2)
                                                                             No he got downright rude. -NT - (folkert) - (1)
                                                                                 I'm not trying to be rude - (tuberculosis)
                                 Where did you get 1G? - (tuberculosis) - (2)
                                     Not all 9 digit numbers are valid. - (Another Scott) - (1)
                                         I assumed so - (tuberculosis)
                             something you forgot, what lawyer to hire - (boxley) - (1)
                                 Nah - (crazy)
                             We talked about this one before - (admin) - (6)
                                 We did - he didn't - (tuberculosis) - (5)
                                     Just saying... - (admin) - (4)
                                         But we hardly talk about programming at all anymore - (tuberculosis) - (3)
                                             :-) - (Another Scott) - (2)
                                                 I agree on the eyeballs. - (static)
                                                 I put most of my programming rants on my blog - (tuberculosis)
                             non programmer answer, append a 0,1 or2 - (boxley) - (15)
                                 PICK?!?!?!?!?!?!?!?!?!?!? - (crazy) - (5)
                                     dont knock it until you have tried it, recently -NT - (boxley) - (4)
                                         I've spent the last 20 years systematically killing it - (crazy) - (3)
                                             again, see what the needs are and then ask what the solution - (boxley) - (2)
                                                 One trick pony - (crazy) - (1)
                                                     If that is the trick you need use it -NT - (boxley)
                                 The key is to stay off the disk - (tuberculosis) - (8)
                                     CramFS in RAM. -NT - (folkert) - (6)
                                         That's not a solution either - (tuberculosis) - (5)
                                             Okay, at this point... - (folkert) - (4)
                                                 Sure - just want to give you a taste - (tuberculosis) - (3)
                                                     personally I hate programming, I like debugging - (boxley)
                                                     Cheaper at Bookpool, btw. - (admin)
                                                     ICLRPD - (drewk)
                                     Math isnt my strong suit but - (boxley)
                             I hope you're going to outline the solution. :-) - (Another Scott) - (3)
                                 Admin did it very nicely. - (crazy) - (1)
                                     D'Oh! Thanks. -NT - (Another Scott)
                                 Cool sort paper - (crazy)
                             Taking a stab - (ChrisR) - (2)
                                 Still doesn't beat O(1) :-) -NT - (admin) - (1)
                                     Ok, I read the answer you gave... - (ChrisR)

Actually validated for HTML 4.01 Transitional compliance.
118 ms