I'd sure like to see sample exclusion data. Using the first six digits (area code + prefix) to lookup a lazily-created bit vector for the last four digits might save the most memory. Area codes are sparce, as, in some area codes, are prefixes. The combination might minimize the number of bit vectors you'd have to create to probe for the final four digits.