The Eternal Sunshine of the Relational Model

I cannot count the number of times customers or prospects have asked us to remap a relational database structure into a directory, or vice versa. So keeping a separate, different database structure for directories is and has been a very expensive operation for most companies.

So let’s take a look at why we need to have a separate database structure, and how we could reduce the pain of synchronization.

Now, in my previous post, I asked why we need directories in the first place. But I’d like to refine that question a little further:

* Do we need a hierarchical structure like a directory? (I’ll be discussing this question in future blog posts, so stay tuned!)
* And the issue behind today’s post: do we need a separate, different kind of database to support such a structure? (After all, didn’t we all learn that relational theory is the alpha and the omega, back in Database 101 class?)

Let’s face it, relational databases are:

* The workhorse of most enterprise applications.
* Well understood, well supported, and transaction-enabled.
* Currently optimized for updates.
* Potentially able to answer to all kind of queries.

So why not implement a directory using a relational model on top of standard SQL engine and avoid the complexity of having to synchronize two very different data models?

- – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – -
Achieving Rapid Data Recovery for IBM AIX Environments
- – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – - – -

Two Possible Directions, Two Big Dilemmas

If we build a directory using relational best practices, we’ll end up with a model that is “standard” and well understood and well integrated with other SQL applications. So far, so good, right? In terms of design, we will likely end up either with a recursive relation, or a static number of joins for a predefined depth of our tree. The problem is that the result of our queries would be atrociously slow—no matter how you slice it, multiple joins or recursive queries are very expensive with mainstream SQL implementations. So this approach won’t work because a directory needs fast lookup/queries.

Or we could bypass best practices and build a “hard-coded” hierarchical structure model that just runs on top of a SQL database. In fact, some vendors took this approach (IBM and Oracle, I believe) and it works well.  But the approach is so specific that we’re basically building a hierarchical database—a separate hierarchical structure using SQL as an access method!  This gives us excellent results in term of speed.  But now we’re back to square one: we have 2 different database structures, two different models, and the same old problem of synchronization, This approach buys us speed but creates exactly the impedance headache that our DBA wanted to avoid.

(In fact, If we’re foregoing the relational model, why not just bypass SQL altogether and use the core access method engine (ISAM/B*trees) to win some extra speed, which is how the majority of LDAP implementations have been done?)

The Best of Both Worlds: Caching In…

So on one side we have a highly scalable system that’s optimized for transaction and updates, making it ideal for storing the data from our directory, which could deliver all the queries we need, but not at the right speed, because the multiple joins of a recursive call kills our real time/online performance. But what if we virtualize and cache the result of all queries beforehand, and update this cache incrementally in real time as new entries are created?

That would give us the best of both worlds: the infinite flexibility and scalability of the relational model, and the speed of a directory.  On top of that, we also get the rich context that a hierarchical structure, such as a directory, can carry, while benefitting from a well-established, well understood entity relationship model.  So when we write, update, or insert, we’re working in the classic SQL model with transactions our application loves and understands. And when we query, we get the speed of a directory, because the cache is the directory.

One final thing: since all the data from an existing application could now be virtualized as a set of context trees, an interesting thing happens—We inverse the relationship of the directory with the rest of the infrastructure.  The directory no longer sits in splendid isolation; instead, it’s fed by a grassroots effort of “publishing” which could be consumed by the community. (Sounds a little like the web, right?)

But I’m getting ahead of myself here…let’s dive into this in my next post, when I will cover hierarchy, directory, and context.

Michel Prompt

About Michel Prompt

Michel is a world-renowned developer, researcher and entrepreneur who most recently founded Radiant Logic, a software company focused on integrating Directory Services (LDAP), XML and databases to enable companies to easily locate and combine information housed in disparate databases and better manage e-business. Prior to founding Radiant Logic in 1995, Prompt served as Senior Vice President of Client/Server Technology at Knowledgeware, now known as Sterling Software. In 1986, Prompt founded Matesys SA in France, a company dedicated to providing services and database support. Matesys introduced the first "file-manager," duplicating the "Apple Macintosh finder" under Windows 2.1 and sold 500,000 copies to IBM for academic bundles. Prompt founded Matesys Corp., in the U.S., in 1991, becoming one of the pioneer companies in client/server technology. Matesys introduced one of the first visual programming tools under Windows 3.0 for the client/server market (Object View). Prompt successfully sold the company to Knowledgeware in 1993. Prior to Matesys, Prompt was a core developer in the database group of the GCOS 7.0 Operating System for Bull Systems. He also served as a consultant to Cap Sogeti Gemini, one of the largest IT service companies in Europe. Prompt received his diploma from Institut Politiques de Paris (Political Sciences Institute of Paris). He has a Masters degree in applied mathematics from Paris Dauphine University and received a diploma of advanced studies in computer science from Paris Dauphine University.

, ,

No comments yet.

Leave a Reply


*