Posted by: camz | April 15, 2007

Objective Relations II: Seductive Identity


I’ve doing a lot of thinking on the topic of whether you can or can’t mix object and relational models, what I have discovered has been quite enlightening. In my previous posting, I determined that the same data hierarchies can be represented in either an object orrelational model. I also determined that you can mix the models (why not?), but that the trick is not in the mixing but in the realization that the model you have is a mixed one. The sad realization was that most people do mix the models and are completely unaware of having done so, and I needed to figure out why (and how) this was happening.

The use of IDs in objects is where the models typically mix, and also where most people fail to realize that a mix has taken place. Seems that the concept of a simple integer id incorporated into an id is a seductively powerful concept. I won’t claim to completely understand why this is, but I will share with you my thoughts on why the id has such seductive power that even the die-hard OO zealots fall victim to it’s allure.

For the older generations of developers (of which, I belong) that learned to program with the original high-level languages and who have experience with relational databases, the use of an id is quite natural. We did not typically design our data models using hierarchal data structures, we thought in terms of records, and if we designed any complex data model, we designed it in a relational database. You quickly learn to use ids to link tables together, and since we started out with more resource constrained systems, the appeal of using 3NF relational models compliments our awareness of the amount of resources our applications consume. When your code is written to deal with data as arrays of structures, it is much, much, easier to write functions (we didn’t call them methods) to manipulate the data if you could just pass an id for the record around, or perhaps a pointer rather than moving the entire structure.

The basics of writing programs typically include the use of arrays, and eventually arrays of data structures, which are usually our first forays into more advanced programming tasks involving large amounts of data. When you use an array, the index of the array becomes a “natural” id for a data record. The same is true of anyone learning an OO language, the basics still involve learning about arrays. It’s a quite easy to adapt from an array index to an id column in a relational database. The use of those ids inside a data structure that references another data structure works very well. The code itself might use a pointer, but you can’t write a pointer out to disk and then read the same data back again and still have the pointer be valid. If it’s an id, you can… they fit perfectly into the relational model when using an relation database.

Of course even a C programmer can create hierarchical data structures using pointers, linked lists, and such. In the end though, when we need to persist that data, the ids are convenient.

So what about the object world?

I believe that in a “pure” OO model, the ids do NOT belong. The represent a direct coupling of the persistence layer and the object layer, which is considered a violation of most OO methodologies. There is a problem with object models though, which is that a complete object hierarchy graph can actually represent a lot of duplicated data. The exact same reasons why we use 3NF in relational models applies here too, it takes a LOT of effort to ensure that all the copies of a child object get updated in all the in-memory object graphs. In C you’d just use pointers to “solve” this issue, but a pure object model does not expose such a low-level capability, which is where the use of an id comes into play. The id effectively becomes an abstract pointer to another object, eliminating the need to include the entire child object, while still allowing an easy way to access the data in the child object. Only populating the id of a child object becomes a convenient solution for “lazy loading” our object graph. The other issue is that there are very few object databases, so the vast majority of the time you’ll be using a relational database for the persistence layer.

So the id winds up being used as an compact, convenient alias for a child object, providing us with a convenient solution to “lazy loading” object graphs, and allowing us to pass an id as an argument to a method in place of passing the object itself. Convenience aside, this violates the pure OO methodology and introduces coupling not only between the object and the persistence layer, but depending on usage may also create tighter coupling between objects and external methods the manipulate them.

The conclusion is that the seductive quality of the id is convenience, and for most developers convenience trumps following the purer object models. In fact, this convenience is so seductive that most OO developers don’t even realize that using an id is mixing object and relational models.

Truth be told, creating a “pure” object oriented system is hard work, really hard. The relation model fits very closely with how a compiler has to actually implement the OO concepts in our HLLs, which means if you design and use your objects in a more relational way, you’ll wind up with a program that is more likely to perform much better than a pure OO system. The CPUs that eventually execute our code (which still has to occur even when we use languages that “execute” in virtual environments like JVMs and the .Net CLR) perform better with relational models than object ones.

I was once told by another developer that the design that they always used and found to work very well was to use a one to one mapping of object to database table. Truth be told, this is NOT an object model, it’s a relational one… if you can perform a 1:1 mapping from your objects to relational database tables, then you have a relational model, if you had an actual object model, it would NOT map 1:1 to a database. The reason it worked was not because they had implemented an OO system properly, and not because they managed to avoid mixing the two models. The did manage to avoid mixing the models, but they used a pure relational model, and not an object model. As I mentioned before, the pure OO model is very difficult to do properly, especially when you must provide a persistence layer that has to deal with a relational data model in the form of a database.

This performance aspect is another influence in the seductive power of ids in our objects, their use can help improve performance of a slow OO system by a significant factor.

So how would you implement something like lazy loading in a purely object model? I believe the answer is inheritance, one of the core concepts/capabilities of an OO system, and probably the most abused, misused, or neglected part of an OO language. People either go crazy with so much inheritance that you wind up with unusable code, and insanely deep / complex class hierarchies, or they ignore them all together and use inheritance as an overly complex type enum.

I think a proper object data model would have a family of objects, where the “top” levels always represent the lazy load state. This level would never contain complex objects, no arrays of child objects, and only a single level of child object (ie. the only child objects included should not contain other objects, only simple types). The various levels of “heavy” loading would be accommodated by additional objects that inherit from top level and then add their own members any additional levels of data.

Not all OO languages make this easy to do. To make it easy, you’d have to have methods that allow either the base (lazy) object or the inherited object (full object) as an argument. You can do this in C#, but you have to use casts (which is a PITA, and definitely not convenient). You don’t get this functionality unless you use an interface and inheritance, which requires a lot more up front design and planning to get into a design early enough to produce benefits.

Welcome to the dark side of Objective Relations… the id is your mistress from the relational side.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

%d bloggers like this: