Posted by: camz | April 12, 2005

The Knuth of the Matter

“People who are more than casually interested in computers should have at least some idea of what the underlying hardware is like. Otherwise, the programs they write will be pretty weird.”
– Donald Knuth, 2004

Donald Knuth is very wise, a god of computing, so I don’t doubt for a second that he understands that this quote applies equally to layers of software abstraction as well. I love this quote though, it has some wonderful subtleties that I am sure are completely lost on fragile programmers.

So lets talk about abstraction. It’s a wonderful thing, but only if there is some comprehension of what is being abstracted. Sure, you can treat the underlying abstraction layer as a black box, but when we are talking about software that can, more often than not, be a fatal thing to do. Of course with software abstractions, the fatality is rarely instant, instead an obscure, difficult or impossible to document set of circumstances must occur. As so many of the current and next generation of high-level languages gain acceptance, there is more and more abstraction and less and less understanding of the the underlying layers.

One example is in the use of XML for everything. Don’t get me wrong, I like XML, what I am opposed to and appalled by is that so many developers use it for everything. I’ve seen it used for configuration files, for scripts, for calling functions and passing arguments and for data payloads and IPC. It can do all of those, in fact it can do some very well. So, what’s the problem? Well, there are several, the first is that XML isn’t always the best solution, it might not even be a suitable solution. The second is that all too often a developer chooses XML as a solution without even questioning or wondering if it’s the appropriate solution. The availability of XML parsers makes it convenient, and it’s also pretty popular right now.

This is where we come back to some of the subtleties of Knuth’s statement. I know, you are thinking “but what does XML have to do with hardware?”.

Ah, see… that’s the subtlety. At some point you have to do something with that XML. You might need to parse it, or store it, or transmit it.

To parse it you need CPU power, XML isn’t exactly the easiest thing to parse, it requires decent amount of processor and memory to get the job done. That places a constraint on just how fast you can do that job, or possibly on how many concurrent jobs you can do without it taking so long as to become unacceptable. The more places you use XML, the more of a concern this becomes.

What about storing it. XML is very verbose, I don’t think anyone will disagree that one thing XML isn’t is small. That means it uses more disk space than other solutions. It also means it takes longer to read it in (not counting the time it takes to parse). You might need a bigger disk, or a faster disk or both. I suspect that some of you are saying that yes, XML is large, but it usually compresses quite well, so you can store it compressed. A nice easy solution. Well, not exactly, it’s a trade off, compression isn’t easy either, it takes CPU and possibly memory too. It’s convenient though, you probably have an API or method that does it for you. Another example where the convenience of an solution takes the place of actually thinking about it’s suitability.

Transmitting XML has the same issues as storing it, and a couple more too. We enjoy nice, fast, low-latency, inexpensive broadband access to the internet. It’s been years since you even thought of something as primitive as a 56K dial-up connection, let alone a 14.4K one. You have lots of bandwidth to spare, sure that XML payload is big, but it still transmits in a couple seconds, so no problem, right? Wrong. Most of the time when you are transmitting XML it’s a commercial/business application. Bigger messages mean that we max out our bandwidth sooner, which means fewer transactions per second. Want to handle more? Now you need a bigger connection, and those are expensive. Want to support wireless users? They don’t have as much bandwidth, suddenly your solution isn’t working so well.

I was going to say something about threads in relation to this topic, but I see that XML has proven to be more than enough.

Knuth has got it right, but I think maybe he left some unsaid. Sometimes it isn’t just the programs they write that can be pretty weird, but also the way those programs behave.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s


%d bloggers like this: