Posted by: camz | June 15, 2008

Telling Good from Bad Design

As the common wisdom goes, the best way to fix a badly designed system is to design it right in the first place.

I know all to well just how much truth there is in that. The challenge is in telling when a design is bad in the first place, which is more difficult than it sounds. There are all sorts of metrics that can be captured during the development process, but none of them will yield any true indication of how good (or bad) the design actually is. Experience tends to be the only thing that is reliable, and even then it can often just be a “gut feeling” that something is wrong and incredibly difficult to identify what exactly that is. When a design is mediocre, it’s even harder since there is often an equal mix of good and bad making it all that more difficult to tell one way or another.

I’ve seen a lot of interesting things, a small handful of amazing things, quite a few questionable things, and even some amazingly horrifying things done in the name of design. I’ve had to dig deep into systems and their design to either work with them or fix them (usually fix them). Through that experience I have discovered that the design of a project can be summarized with one of two simple characteristics: one for good, the other for bad.

Good Design
Good designs produce additional benefits, features, and capabilities that the original designers did not conceive or plan, but that exist as a byproduct of the design. These capabilities can be leveraged with little or no change to the original design, or extend the original design without requiring extensive refactoring or redesign. New capabilities can be found in exceptional designs with nothing more than a different perspective rather than any change to the code or design.

Bad Design
Bad designs produce workarounds, special cases, constant (and often extensive) refactoring of both the code and the design, and are typically accompanied by long, involved explanations [from the designers] on why something can’t be done.

It really is as simple as that. The hard part is that it takes experience to recognize both of these and there is no magical way to get that experience other than to work on a lot of projects. That includes working on bad projects too, they might suck, but if you pay attention to what went wrong, and take the time to figure out why, even a bad project can become a gold mine of knowledge.

A Good Example – The QNX® RTOS
One system that I have encountered that would fall into the category of good design would the the message-passing inter-process communications (IPC) in the QNX® Realtime Operating System. I have worked with 3 generations of QNX, and although each generation enhanced the IPC mechanism, the “guts” of the QNX IPC remained the same because the fundamental design was good. They were able up add capabilities to new versions without changing the fundamentals of the original design.

Each and every version of QNX has the IPC mechanism at the core of the OS, for those interested in learning more check out the System Architecture Guide online. I will will focus on a few examples of how this single design feature of the OS demonstrates the characteristic I have described in this article.

One of the first features that QNX gets “for free” as a direct result of building the IPC into the core (yes, it is in the kernel) of the OS is that of modularity. The actual OS is made up of a group of processes that work together to provide the functionality of a traditional monolithic kernel. They also extend the IPC mechanism across the network, which has the side-effect of making the network completely transparent. This is an exceedingly difficult thing to accomplish in a monolithic kernel, and thanks to the excellent design of QNX’s IPC, it comes “for free”. A network of QNX machines is a loosely-coupled multi-processing computer, something that requires special software and application awareness on other platforms to accomplish.

The IPC mechanism allows the OS itself to be built of modules, and once again we see byproducts of this design choice. Multiple filesystems modules, multiple network links, device drivers, etc. Fault protection becomes a byproduct as well, since the IPC mechanism allows the OS to be built from multiple cooperating processes using IPC, each process is protected from the other, and the failure of any process does not have a catastrophic effect on the whole OS.

The simplicity of the IPC implementation also provided synchronization of cooperating processes, something that again, is often difficult to implement with other designs. Asynchronous communication was also possible without changing the IPC mechanism, and only changing how an application made use of the IPC features.

In later versions of QNX, a graphics system was added which QNX called Photon. Photon also leveraged the IPC mechanism to simplify the implementation of graphics drivers, input devices, and “clipping” of windows as they overlap. Photon itself enjoyed its own “freebies” as a direct result of layering on top of the IPC mechanism. Remote consoles, mirrored consoles, multi-monitor support, foreign platform support (Phindows – Photon in windows & PhinX – Photon in X) all became free or trivial with the design.

These represent just a few of the additional features that just “fall out” of a single good (in this case excellent) design.

A Bad Example – Generating SQL in an OO-System
This is an all too common scenario in object-oriented development. You start out with an hierarchical object design, and then create a corresponding relational database design in which to persist the object data. A typical approach is to create a data access layer (DAL) which consists of objects that serialize object data into SQL, and deserialize from SQL back into objects. To keep things simple, especially when the OO-developer is designing the database, it is quite common to see a “one table per object” schema design.

The problems start when you want to retrieve a “complete” object which includes all its hierarchical child objects. The most intuitive solution is to have the DAL object call other DAL objects for the child objects. This is easy to do, and easy to test, and looks fine on the surface. Of course, it isn’t fine, and the first issue to crop up is one of performance. The performance issue is caused by iterating through the child objects, generating additional DAL / SQL calls, which in turn do the same for each level of child objects until there are no levels remaining. For an object with a lot of child objects, or multiple levels in the object hierarchy, this can quickly result in a large number of SQL queries, all for a single object. The issue is compounded when retrieving data for an array or list of these objects.

Let’s take a step back to look at this. The design isn’t very good because it fails to resolve the issue of making an object hierarchy work well with a relational database. The design is simple, which would normally be good, but the simplicity results in iterating through the object hierarchy and an increased number of round-trips to the database server. Neither of which would be considered a beneficial byproduct of the design.

Unfortunately this is often regarded as a performance issue, rather than a design issue and a workaround is then introduced to improve the performance.   A typical solution is to extend DAL objects  to add a “depth” attribute that can limit the number of  child object levels that are retrieved. Every single DAL object must now be modified to use and update this depth-tracker. When we are all done, it works, and performance will be improved, but even the improved performance can still be quite poor, because we have only reduced the amount of iteration rather than eliminating it.

This process can continue indefinitely, workarounds usually address the symptoms of the real problem instead of the problem itself. They also tend to be very specific and targeted, increasing the likelihood that any given patch may introduce new issues and create new symptoms since the underlying issue has not been addressed. After a while, the number of patches can make identifying the root-cause more difficult.  This can happen very easily in teams focused on OO development methodologies where there is often a mindset to avoid premature optimization.  If the issue is mis-diagnosed as a performance issue, rather than a design issue, the fix may get pushed out to a point where it is no longer feasible to change the underlying design.

One of the better solutions to the SQL issue outlined here is quite simple, stop thinking in terms of OO, and start thinking in relations, which is what the database uses.   Once you do this, it becomes obvious that you can take the most frequent requests, identify how many levels of objects are involved and write a single SQL query that uses join statements to get all the levels of data in one fell-swoop. This is one example where the differences between OO methodologies and relational are disconnected. What is obvious in one isn’t necessarily efficient, optimal, or even a good idea in the other, and may even be contrary to the other methodology.

Each step taken in this bad design is relatively simple, and appears to be the most intuitive at the time, making it all too easy to wind up with a design that has serious problems. The symptoms often masquerade as a different problem making it difficult to realize that the design contains a flaw rather than the implementation. This is often occurs in the absence of proper requirements forcing an iterative cycle of partial design intermixed with coding in place of upfront design and understading of how each element of the solution will be implemented and used.

This is perhaps why this particular example (mapping object to relational) comes up so often, and why it is so easy to take such a course without understanding that the design needs to change not just the implementation.

As you can see, in the bad design example, we got no beneficial byproducts from the design, and instead got a series of issues and problems requiring more and more workarounds to address.



  1. Hi Rals– I loved this whole post from JH! Thanks for sharing it. =) Mom Click

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s


%d bloggers like this: