So, I have been dabbling...err

working on an LDAP schema that is (to be honest) a worrysome amount of data, just for the schema objects. I am trying to figure out how to make schema extension to cover ALL Commercial and other Directories. I want >95% compatibility/interoperability for eDIR, ADS, NIS/NIS+, DNS, SLP, Kerberos v5, SunONE/iPlanet, Oracle's OID, IBM's Tivoli DS and Nexor Directory. (prolly any other I can find too)

So, I look at what my choice in LDAP Server, OpenLDAP, uses for storing info, by default. Sleepycat DB4. Filesystem directory storage, HUH!?

OMFG! WAH?

Well, I want to use PostgresQL. Not just for the DB end of it, but for the relational features. I dunno if I this is wise, but it sure the hell makes more sense than the sleepycat stuff. Yes, I know the Sleepycat is mature, it is optimized... but come on.

So, now I have to think about a database schema now too. Sheesh enough with the schemas already.

So here goes with the Database schema only. And I would like some comments, remembering this is my first real "all by myself" data normalization attempt. Also, I haven't really paid much attention to POstgresql's limitations on indexing or what not, so these thing may change.

Data Entry table. This table holds the information the LDAP entries. This for assigning the UniqueID of an entry and to support LDAP_SCOPE_ONELEVEL and LDAP_SCOPE_BASE searching scopes. The parent and child table is in the entry table since all the other attributes are dependent on UniqueID. The columns for this table:

ColumnName Description Indexed?
UniqueID The UniqueID of the LDAP record Yes
ParentUniqueID The UniqueID of the parent LDAP record for the LDAP directory heirarchy scheme No
DN The distinguished name of the entry No
DN_SHRT Short DN upto 150 characters of the DN for "meta" lookup Special
RecordData Entries are stored using ASCII in the form of "attribute: value"
Non-ASCII OR binary values are base64 encoded No
Creator The DN or UniqueID^{[link|#one|1]} of the record creator No
create_timestamp Records the time when the entry was created No
Modifier The DN or UniqueID^{[link|#one|1]} of the last record modifier^{[link|#two|2]} No
modify_timestamp Records the time when the entry was last modified^{[link|#two|2]} No

Searchable Attribute table(s). There is one attribute table per searchable attribute.(e.g. 800 searchable attributes == 800 Atributes Tables named after the attribute, for scalability(I hope)) Each LDAP entry is assigned a unique identifier (UniqueID) by the store. The columns in this table are:

ColumnName Description Indexed?
UniqueID The unique identifier of the LDAP entry Yes
AttribValue Normalized attribute values No
ShrtAttribValue If AttribValue is < 151 bytes then == AtrribValue,
If AttribValue is >= 151 bytes then == AtrribValue trunc @ 150 bytes Special

Descendant table. This table is for the subtree search of LDAP. Each LDAP entry with a UniqueID (AncestUniqueID), this table contains the unique identifiers (DescendUniqueID) of the descendant entries. The columns in this table are:

ColumnName Description Indexed?
AncestUniqueID The unique identifier of the ancestor LDAP entry Yes
DescendUniqueID The unique identifier of the descendant LDAP entry Yes
For every entry in the directory, a row exists in this table for each of its ancestors including itself. If every entry were exactly the same depth, record count at worst case == O(nm), where n is the node count in the directory and m would be the depth of the tree. This could be my achilles heel

So, how is this design, I did have some help... I also referenced a few private docs I have from years ago when I was debuging an NDS schema error.

Now, once I get advice and make changes, I'll have to create or build LDAP<->PGSQL filters. Maybe stored procedures would be better. I'll also have to finger out how to bulk load and so on. Replication shouldn't be an issue, I can do it with the LDAP implementation or the DB implementation. Even Clustering shouldn't be too hard for this...

I do know I have set lofty goals for this... but if I can just make sure the design is good, fixing problems as they arise... things should be good, giving me the ability to extend and addto the setup as needed. As well as multiple methods for assurance.

Footnotes:

DN or UniqueID, that is the question. I am unsure which one will be most efficient/fastest.
At this point Last record modifier only. Perhaps later, split this table out to a "Modifier Table" with RecordNum, ModifiedUniqueID, ModifierUniqueID and ModifiedTimeStamp as the columns.

Welcome to IWETHEY!