that isn't boiling down to anything that gives you a big win. Not when 300 million pointers takes that much space.

The knowledge that it is "gap-y" could provide a space win, but only at some performance cost. You'd have to store the data in carefully compressed form and then decompress before using. For instance you could have a table that maps every 64K block of social security numbers to a block of data, which has Scott's format stored using a standard kind of compression. To lookup a social security number you have to decompress the appropriate block then look up your social security number.

Well filled-in stretches and gaps should both compress well. However now every lookup involves decompressing a block of data rather than just a bitmap. This can only be recommended as a win if the data is seriously uneven and 128 MB of RAM simply can't be done.

Given the cost of RAM, I'd go with Scott's solution.

Cheers,
Ben