So, I look at what my choice in LDAP Server, OpenLDAP, uses for storing info, by default. Sleepycat DB4. Filesystem directory storage, HUH!?
OMFG! WAH?
Well, I want to use PostgresQL. Not just for the DB end of it, but for the relational features. I dunno if I this is wise, but it sure the hell makes more sense than the sleepycat stuff. Yes, I know the Sleepycat is mature, it is optimized... but come on.
So, now I have to think about a database schema now too. Sheesh enough with the schemas already.
So here goes with the Database schema only. And I would like some comments, remembering this is my first real "all by myself" data normalization attempt. Also, I haven't really paid much attention to POstgresql's limitations on indexing or what not, so these thing may change.
Data Entry table. This table holds the information the LDAP entries. This for assigning the UniqueID of an entry and to support LDAP_SCOPE_ONELEVEL and LDAP_SCOPE_BASE searching scopes. The parent and child table is in the entry table since all the other attributes are dependent on UniqueID. The columns for this table:
Searchable Attribute table(s). There is one attribute table per searchable attribute.(e.g. 800 searchable attributes == 800 Atributes Tables named after the attribute, for scalability(I hope)) Each LDAP entry is assigned a unique identifier (UniqueID) by the store. The columns in this table are:
ColumnName Description Indexed? UniqueID The UniqueID of the LDAP record Yes ParentUniqueID The UniqueID of the parent LDAP record for the LDAP directory heirarchy scheme No DN The distinguished name of the entry No DN_SHRT Short DN upto 150 characters of the DN for "meta" lookup Special RecordData Entries are stored using ASCII in the form of "attribute: value"
Non-ASCII OR binary values are base64 encodedNo Creator The DN or UniqueID[link|#one|1] of the record creator No create_timestamp Records the time when the entry was created No Modifier The DN or UniqueID[link|#one|1] of the last record modifier[link|#two|2] No modify_timestamp Records the time when the entry was last modified[link|#two|2] No
Descendant table. This table is for the subtree search of LDAP. Each LDAP entry with a UniqueID (AncestUniqueID), this table contains the unique identifiers (DescendUniqueID) of the descendant entries. The columns in this table are:
ColumnName Description Indexed? UniqueID The unique identifier of the LDAP entry Yes AttribValue Normalized attribute values No ShrtAttribValue If AttribValue is < 151 bytes then == AtrribValue,
If AttribValue is >= 151 bytes then == AtrribValue trunc @ 150 bytesSpecial
ColumnName Description Indexed? AncestUniqueID The unique identifier of the ancestor LDAP entry Yes DescendUniqueID The unique identifier of the descendant LDAP entry Yes For every entry in the directory, a row exists in this table for each of its ancestors including itself. If every entry were exactly the same depth, record count at worst case == O(nm), where n is the node count in the directory and m would be the depth of the tree. This could be my achilles heel
So, how is this design, I did have some help... I also referenced a few private docs I have from years ago when I was debuging an NDS schema error.
Now, once I get advice and make changes, I'll have to create or build LDAP<->PGSQL filters. Maybe stored procedures would be better. I'll also have to finger out how to bulk load and so on. Replication shouldn't be an issue, I can do it with the LDAP implementation or the DB implementation. Even Clustering shouldn't be too hard for this...
I do know I have set lofty goals for this... but if I can just make sure the design is good, fixing problems as they arise... things should be good, giving me the ability to extend and addto the setup as needed. As well as multiple methods for assurance.
Footnotes: