Doesn't matter

You've got to go through each bottleneck during an access and then see where people start piling up.

How fast can a program read a file, even if it is pure ram disk.

Hmm - I seem to recall moving data at 300MB per second when walking through a fully cached 1GB file on an Opteron - so let's say if you average out finding stuff in 1/2 the time, ie: 1.5 seconds for that piece of the puzzle.

So, you can server 6 or 7 queries in about 10 seconds, which is what I consider an absolute, worst case, will not wait any longer until I hit the STOP button.

But it is really a waste of activity anyway.

So then your choice is to start working through access and storage. It is preferable if you can code a single program to load the data in memory, and this program has very quick access to the data, and then this program can publish a server interface (simple tcp/ip socket, etc), which other programs can use.

And then you think in terms of memory usage. You don't really need to actually store the data if you can come up with a representation of existence. You'd rather use less resources if possible, as long as it doesn't slow down your access.

I seem to recall this particular task lends itself to bitmapped scoreboarding. Also, which a bit of knowledge of the data type, I seem to recall only certain ranges of numbers are used.

So when you end up compressing what you store, and add a bit of smarts to the search method, you go from serving a couple of dozen query a minute to a couple of thousand a second. At that point you find your networking socket creation becomes the bottleneck, rather than the actual query, so then you might move tho optimizing that. Or you may find a couple of cheap load balanced machines takes care of your complete universe of possible usage, so you stop.

Post #273,972 by crazy 11/26/06 5:49:53 PM Reply	Doesn't matter You've got to go through each bottleneck during an access and then see where people start piling up. How fast can a program read a file, even if it is pure ram disk. Hmm - I seem to recall moving data at 300MB per second when walking through a fully cached 1GB file on an Opteron - so let's say if you average out finding stuff in 1/2 the time, ie: 1.5 seconds for that piece of the puzzle. So, you can server 6 or 7 queries in about 10 seconds, which is what I consider an absolute, worst case, will not wait any longer until I hit the STOP button. But it is really a waste of activity anyway. So then your choice is to start working through access and storage. It is preferable if you can code a single program to load the data in memory, and this program has very quick access to the data, and then this program can publish a server interface (simple tcp/ip socket, etc), which other programs can use. And then you think in terms of memory usage. You don't really need to actually store the data if you can come up with a representation of existence. You'd rather use less resources if possible, as long as it doesn't slow down your access. I seem to recall this particular task lends itself to bitmapped scoreboarding. Also, which a bit of knowledge of the data type, I seem to recall only certain ranges of numbers are used. So when you end up compressing what you store, and add a bit of smarts to the search method, you go from serving a couple of dozen query a minute to a couple of thousand a second. At that point you find your networking socket creation becomes the bottleneck, rather than the actual query, so then you might move tho optimizing that. Or you may find a couple of cheap load balanced machines takes care of your complete universe of possible usage, so you stop.
Post #273,974 by static 11/26/06 5:59:13 PM Reply	That would be a radix-search, I think. Ideal for numeric data like SSNs. "Don't give up!"
Post #274,006 by tuberculosis 11/26/06 11:18:48 PM Reply	Nah, you want a hash You always want a hash. O(1) is really hard to beat. [link\|http://www.blackbagops.net\|Black Bag Operations Log] [link\|http://www.objectiveclips.com\|Artificial Intelligence] [link\|http://www.badpage.info/seaside/html\|Scrutinizer]
Post #274,012 by static 11/26/06 11:51:29 PM Reply	I thought a radix search was also O(1). Perhaps I'm misremembering, but a radix-search is very similar to a hash lookup, but uses the value itself directly rather than hashing it first. I think in practice it's a little more complicated than that, but that's the idea. Wade. "Don't give up!"
Post #274,028 by tuberculosis 11/27/06 12:27:44 AM Reply	Gah - my bad I was thinking of radix sort - which is O(nk) where k is key size. Search is, as you say, proportional to k or O(1). [link\|http://www.blackbagops.net\|Black Bag Operations Log] [link\|http://www.objectiveclips.com\|Artificial Intelligence] [link\|http://www.badpage.info/seaside/html\|Scrutinizer]
Post #273,976 by crazy 11/26/06 6:06:09 PM Reply	Also, I just thought about your file size It is a LOT worse than what you thought if you actually have each number, so our initial query time for a filewalk just jumped tremendously.

Welcome to IWETHEY!