Subscribe to Dr. Macro's XML Rants

NOTE TO TOOL OWNERS: In this blog I will occasionally make statements about products that you will take exception to. My intent is to always be factual and accurate. If I have made a statement that you consider to be incorrect or innaccurate, please bring it to my attention and, once I have verified my error, I will post the appropriate correction.

And before you get too exercised, please read the post, date 9 Feb 2006, titled "All Tools Suck".

Sunday, February 26, 2006

I was wrong (sort of) about namespaces

As we were putting the final brush strokes on the XML 1.0 Recommendation (almost 10 years ago now--hard to believe) we started working on "namespaces for XML". At the time I was quite vocal in my opposition to namespaces on the grounds that they didn't really solve any problems, were misguided, and totally missed the point. I was also very upset, if memory serves, about the issue of using attributes to declare namespaces rather than processing instructions. It got to point where I couldn't really be constructive on the weekly conference calls, my blood presure was going up, and I just generally couldn't deal with it any more. In my mind the whole XML thing was going off the rails just as we needed to focus on the really important stuff, namely XLink. So I quit my participation in the XML Working Group, made a lot of strong statements that hopefully have been largely forgotten, took my toys and went home.

Boy was I wrong. Mostly. Sort of.

I don't know that I've ever actually apologized to anyone about this, especially Tim Bray, who was the primary driver of the namespaces mechanism. In any case--to the degree that my behavior at the time impeded the completion of namespaces or otherwise disrupted the activity, I apologize.

So what was I wrong about?

My objections to namespaces as they were being formulated (and as they subsequently got defined) were:
  1. There was no standard mechanism for defining membership of names in a namespace
  2. There was no standard mechanism for binding a given namespace to an abstract application
  3. Using attributes to declare namespaces was just wrong wrong wrong
  4. Namespaces didn't really solve the interoperation problems that people had or perceived
At the time I had just spent several years rewriting the HyTime standard (ISO/IEC 10744:1996). One of the key features of HyTime was a mechanism called "SGML Architectures" which did both of the first two things I claimed (correctly) that namespaces didn't do. Namely, it provided a way to unambiguosly bind document types (sets of named elements and attributes) to abstract applications and defined a standard way to formally declare the members of that namespace ("architectural DTDs"). It also provided a way to map instance-level element and attribute names to architectural names, the functional equivalent of using different namespace prefixes to disambiguate otherwise clashing local names.

At the time I was so deeply indoctrinated with the big-iron, big-industry, big-standard "everything must be fully defined up front" way of thinking that I just couldn't see a less formal approach working. At the time I hadn't fully gotten the less-is-more Way of the Web. So I objected to namespaces on the grounds that it was just creating this huge potential mess that wouldn't really help.

It's still true that the namespace specification, by itself, provides no formal way to define the set of names in a namespace. That is, by the namespace spec, given a name, there's no way to determine if that name is or is not a valid member of the namespace.

With just DTD's this was kind of a problem because there was no defined way to bind a DTD to a namespace (something we did do in the HyTime standard). But eventually XSD schemas (and I should think, all other XML document constraint specification mechanisms designed since the publication of the namespace spec [But not Relax as of March 2007--WEK]) did provide a way to bind document types to namespaces. This still doesn't formally define the finite set of members in a namespace but it does let you define a set of names for the namespace. The subtle difference here is that the definition of the set of names points to the namespace rather than the namespace pointing to the definition.

Think about that for a moment--the only standard-defined thing that represents a namespace (per the namespace spec) is the namespace URI--there is no requirement (or even expectation) that that URI be resolvable to any resource. A naive person (or a person steeped in big-systems declare everything think) might expect that a namespace URI would always resolve to a resource that describes the namespace as an object. From a pure engineering/mathematical completeness standpoint that seems like a reasonable thing--anything you can name should have a concrete representation: "Damnit Spock, there needs to be an object!"

So why don't namespaces have this? Simple answer: because they didn't need them in order to be useful. That's the beauty of the Web: it's agile methods at work--the simplest thing that could possibly work.

Namespaces work because by and large there's no confusion about what application a given namespace is associated with. In practice there's just not a lot of important namespaces that people use that they don't know, up front, what they're about. The full declaration bit would only be needed in a world where you are likely to encounter a namespace you've never seen and only at that point need to find out what it's about by following the namespace URI. If it was purely software doing this discovery and data finding, then having everything formally declared would be necessary but in fact it's usually humans doing the discovery and finding and just doing a Google[tm] search on the namespace URI is usually sufficient to take you to the specification that defines the semantics for that namespace. So why add one more layer of standardization to an already complicated system? Duh.

I do observe, however, that there still seems to be lots of work on web services registries and all that kind of stuff. I try to stay out of that world because it's not relevant to my day-to-day work and I don't otherwise find it interesting, but I can imagine that it's hard stuff to standardize, both technically and politically.

Which means that it was in fact quite brilliant for the namespace spec to avoid the whole issue by saying that any such stuff was "out of band".

So my first errors were in thinking that there was in fact a need for a single way to limit the membership in a namespace and a need for a central physical representation of the namespace as an object.

So what about using attributes to define namespace declarations? My (and not just my, but a number of members of the committee at the time) felt that intruding on the document author's attribute space in order to do declarative stuff was fundamentally, morally wrong and that processing instructions were there precisely to do this type of declaration. We were, of course, completely correct. But it didn't matter--the Powers that Be had a thing against processing instructions and others insisted that attributes were clearer and what not. Again, for me, not intruding on the author's space was a fundamental, inviolate princple I had learned deeply from my work on SGML and HyTime. It was an essential tenent of my technical faith and I defended it with the furvor of a True Believer.

In retrospect I see now that using attributes was the correct choice, that the namespace stuff is really intrinsic properties of the elements (and attributes) and therefore putting those properties on attributes was the best thing to do lacking a more specialized syntax. It did complicate XML processors a bit because they now have to do some special case stuff with those namespace attributes but who cares? That just affects tool implementors, not users.

About solving the interoperation problem: if I remember the discussion at the time, one of the driving use cases was a situation where an automatic processor would need to dynamically combine elements from two different input documents into a new instance while ensuring that there were no name collisions. I don't know if this use case actually exists but it seemed both bogus and irrelevant at the time. Bogus because I didn't really see where this would happen (but I can't claim any particular breadth of vision for how XML is used--I've always been clear that I represent the world of authoring and publishing). Irrelevant because in this scenario, even with namespaces, you'd have to be prepared to rewrite tags as part of the combination process (either to change namespace prefixes or to disambiguate entire tag names), so namespaces didn't really solve anything here. (And HyTime had already defined an attribute-based mechanism for mapping local tag names to fully-qualified names so why did we need namespaces?).

Again, I was right but pointlessly so. In practice its rare that namespace prefixes clash because most namespaces are well known and defined with a conventional prefix that is likely to be unique among the set of namespaces likely to be combined (again, the more general case of willy-nilly combination of unexpected stuff just doesn't much happen).

So now, in the fullness of time, I have come to fully appreciate the value, nay essentialness, of namespaces. With namespaces, as with HyTime (but with much less definitional overhead), names can be unambiguously bound to applications via namespace URIs formally associated with schemas (i.e., targetNameSpace) and by formal specifications that define the semantics associated with namespaces.

While the direction of pointing (from specification to namespace) may seem backwards, it works just find because nobody's going to define two different standards for the same namespace (at least not on purpose). That is, there seems to be pretty good clarity about what (abstract) application is associated with a given namespace.

For schemas it's not quite as clean but it still works well enough. The issue is that because there is no single standard way to point from a namespace to the formal definition of its member names (because there's no standard object that represents the namespace and that could then do the pointing), its necessary for vocabularly definitions to point to the namespaces they govern. This pointing can be indirect (via a specification that points to formal vocabularly definitions and asserts that they govern the namespace) or direct, i.e., XSD's targetNameSpace attribute.

The problem with the direct pointing is that two schema instances might point to the same namespace, creating an ambiguity about which one is the right one to use at a given moment or for a give purpose. But again, in practice, that's usually resolved by the people setting up the local system or documenting the specification or whatever. But there are times when the ambiguity is a practical problem (which I'll put in another entry before too long).

But even with this potential for ambiguity I think that the importance of unambiguous binding of documents to schemas is so great that all elements should be in a namespace.

I also think that the potential for document type modularity that namespaces coupled with XSD-schema-type features is quite great and that we, as a community, have not yet fully worked out all the issues, practicalities, and potential benefits (as evidenced by, for example, the ongoing sincere discussion of how best to apply namespaces to the DITA specification).

But I am now foursquare in favor of namespaces--they may not be perfect but they are essential and they work well enough as defined (and in any case they are what they are and aren't likely to change any time soon). If you're not using namespaces you should be--I can't see any excuse for anyone defining any set of XML elements that is not in a namespace. It should be required and it's too bad that XML, for compatibility reasons, has to allow no-namespace documents. Oh well.

Live and learn.

Labels:

For Techies Who Are Also Parents (Or Thinking of Being)

While I promised that this blog would be purely technical I'm going to beg your indulgence while I take advantage of the small amount of search engine exposure I have gained to try to gain some for my wife's blog on motherhood, which I think needs an audience.

Now mind you, I wouldn't be doing this if I didn't think her blog was in fact something that the audience for this blog would enjoy. Let me be clear: while she's writing about motherhood it is not in any way a sunshine and flowers "oh children are so precious and being a mother has finally fulfilled me as a woman blah blah blah" kind of stuff. It is in fact not far in spirit and tone from this blog, namely, a sometimes brutally honest, but realistic, personal, and heart-felt take on motherhood.

I think my wife is pretty good writer and I think that if you have any interest in parenting that you might find Julie's writings entertaining. She describes her blog as a personal journal with an audience so she doesn't "write on and on and on, getting ever more depressed and pissed off, awash in the sort of self-pity that your average maudlin 16-year-old would find embarassing."

The blog is "Pissed Off Mom" ("Angry Mom" was taken). Note that the "pissed off" is in reference primarily to society's expectations about motherhood, not her personal state of mind vis a vis her (our) child, who is in fact so full of joy you rarely have cause to be angry at her, even when she's in full two-year-old mode. [If you want the sunshine and flowers stuff, you can go to here.] And please don't misunderstand: Julie is a wonderful mother--it's just that she's been around long enough to understand that no endevour in life is 100% joy all the time, including being a parent, and she needs to write about it.

Now back to our regularly-scheduled technical ranting....

Friday, February 24, 2006

Rants Preview

I'm in the process of moving house and I've been sick and blah blah blah haven't really had time and energy to post to this blog but I thought I would at least preview some of the rants kicking around in my head that I do plan to write about. These are things that I've been thinking about for a long time and/or have already ranted about in one place or another but haven't necessarily written down in as cogent a fashion as I should have. These are in no particular order:

  • External parsed entities are bad and you should never use them

    Short form: Entities are not reusable objects and only lead to pain. Besides, you should be using schemas (no entity mechanism) anyway. It was our mistake to allow them to remain in XML and I for one am as guilty of that mistake as anyone. I now regret it.
  • I was wrong (sort of) about namespaces

    Short form: I said at the time that namespaces were wrong because they didn't define a standard way to bind a namespace to an (abstract) application. In practice it didn't matter and as it happens, schemas (and similar document constraint mechanisms) provide a standard way to do this binding (sort of).
  • XML lacks a standard way to identify abstract applications as distinct from their associated namespaces and schemas

    Short form: An abstract application (for example DITA, your corporate technical documentation system, a cross-industry data interchange scheme, etc.) may involve any number of document types and namespaces. There is no defined way to give a name to the application, as an abstraction, and then formally map that name to the set of namespaces and their associated schemas, application semantics documentation, and other related artifacts. In practice it's not clear that this level of formality is needed (which is probably why we don't have it) but it still seems like, for completeness, such a mechanism should exist. However, I think that a lot of people's assumptions (including mine) about how people would deploy and use widely-used, ubiguitous applications were wrong. If the Web has taught us anything it's that sometimes a solution that seems a little too simpleminded is just right. Go figure.
  • All XML content management systems (with very few, if any, exceptions) are wrong

    Short form: any system that manages XML storage at the element level is fundamentally flawed, unnecessarily complicated, doomed to performance and maintenance problems, and just plain misguided. This does not apply to systems that are only intended to enable retrieval, such as MarkLogic. Note that indexing at the element level (which you have to do) is different from storage at the element level.
  • Exactly one XSD schema doc per namespace

    Short form: things get funky when there are multiple XSD documents for a given namespace in the same processing/storage/management environment absent a well-understood mechanism for distinguishing them.
  • All newly-defined element types (that is, element types defined from this time forward) should be in a namespace other than the no-namespace namespace. Legacy applications should be reworked to use namespaces as soon as practical.

    Short form: Namespaces enable unambiguous binding of documents to constraints in a non-author-subvertible way. Namespaces enable clear and unambiguous integration of different vocabularies into a single composite document type.
  • PUBLIC identifiers are bogus and pointless and there is no reason to ever use them, even with DOCTYPE declarations.

    Short form: First, you shouldn't use external parsed entities or DOCTYPE declarations anyway (in which case PUBLIC IDs aren't even an option). Beyond that, in XML, PUBLIC identifiers are redundant with URIs for the same resource and therefore just add another name to an already croweded world of names to be managed. OASIS catalogs can remap URIs as easily as PUBLIC IDs so there's no indirection advantage there (and there never was--they were bogus in SGML too but I don't think any of us really appreciated it at the time).
  • XInclude is good as far as it goes but it gets ID handling during transclusion wrong

    Short form: It is unnecessarily and inappropriately constraining to require XML IDs to be unique among all the members of a compound document. XInclude processors must be capable of rewriting IDs and references to them so that the transcluded result retains the appropriate uniquess constraints. I understand why XInclude imposes this requirement but in practice it is not useful, especially in the context of document authoring systems. (I wrote a paper about this for one of the recent XML Europe conferences)
  • XInclude is not, as specified, appopriate for document authoring.

    Short form: Practical authoring requires that XInclude elements be specialized so that references can have context constraints imposed and so that they can express constraints on what can and cannot be referenced. (Also in my XML Europe paper).
  • Xerces, while otherwise an exemplary tool and a key part of any Java-based XML processing system, is fundamentally flawed in how it enables/does URL resolution via OASIS catalogs

    Short form: OASIS catalogs clearly enable and expect URLs to be recursively mappable via a single catalog. That is, an XML processor doing catalog-aware resource resolution should continue to try to use the available catalogs to resolve a URL until all catalog entries are exhausted--only then do you attempt to resolve the result URL to a resource. Out of the box, Xerces does not do this and provides, as far as I can tell, no API to enable it. In particular, Xerces conflates or confuses entity resolution with resource resolution. I tried to find a code fix but the code was too convoluted and the problem seemed to be a fundamental design flaw that would require significant change to the Xerces API. I submitted a bug but was essentially told "not a bug". But it's a bug. Ask Norm.


That's all I can think of this morning. I'm sure there are more.

Sunday, February 19, 2006

Bob DuCharme is Linked In

I've just discovered Bob DuCharme's blog. Bob is another SGML and XML person I've known for as long as I can remember. He and I have similar professional backgrounds and employment history although he has focused more on search and retrieval than on technical documentation and publishing. He definitely knows his stuff and has opinions worth listening to. He has a recent post on LinkedIn which I thought was pretty interesting (and that I got to by following a reference from Bill Trippe's blog, which as also interesting).

I'm in LinkedIn (as Eliot Kimber), mostly because it seemed harmless enough and most of the people I work with or recently worked with at Innodata Isogen are in and why not maintain those connections? But I haven't really done anything with it.

So far in my career I've never had to work particularly hard to find a job--I sent out four resumes as a senior in college and one of them got me hired at IBM. I worked there for 10 years. When I realized it was time to move on, the first guy I asked about a new job said "sure" and I was suddenly working for a start up (Passage Systems. When that startup started going castors to the ceiling, I had already met Carla Corkern, one of the founders of ISOGEN, and was negotiating salarly (and that was nearly 10 years ago now--I joined ISOGEN in November of 1996). And that's my complete post-college work history (before that I had a series of food-service jobs and one two-week stint as a mover).

So should I ever decide I have to get a new job for some reason, I'm not sure I know how to go about it, especially considering that I don't have quite the same personal and family flexibility I did then nor do I have the same visibility in industry, having been less active at conferences and on public fora then I was at that time. And I haven't published a book either (not that I didn't try, but that's another story). Now I have outrageous salary requirements, I don't want to travel too much (being a new dad), I'm not moving from Austin (of course I wasn't then either, so that's not different, but I was willing to commute to San Jose or Dallas from Austin--not so much now), and so on. Much harder to find a job that will fit under those circumstances.

So maybe this LinkedIn thing isn't such a bad idea....

Anyway, Dr. Macro says check out Bob's blog, read his books, and so on.

Sunday, February 12, 2006

All Tools Suck: More Explanation

In addition to all the reasons I outline in the first post explaining why all tools suck, it's also important to understand what my normal relationship to most tools is.

I am not a typical user by almost any measure. That is, I'm not interacting with tools as a user of those tools trying to accomplish some task that those tools have been designed to support. Rather, I am interacting with those tools as an evaluator and integrator trying to make them work with other tools, trying to make them do what my clients want them to do, or extending their built-in functionality to do things they don't do out of the box.

This means that I rarely, if ever, care about what things the tools get right--if a tool gets it right then I don't care because I don't have to worry about the stuff that works, only the stuff that doesn't work.

This is a recipe for frustration and bitterness, because tools, no matter how good or how appropriate for task never do everything you want them to do, are never without bugs or API holes or silly user interface things, but all I see, all day, every day, is what doesn't work, what impedes my ability to satisfy my client's requirements [It's also important to know that Innodata Isogen is typically only hired to do stuff that is hard or that hasn't really been done before, so we're rarely doing workaday, tried and true stuff, but pushing the envelope, whatever it happens to be that day.]

It would be a little like being a taxi driver who only picked up mean crazy people--you'd get a very skewed view of humanity. It's sort of like that.

So that makes me pissy and bitter but nevertheless I still have to make these things work so I have no choice but to continue beating my head against these things day in and day out. Just like the taxi driver, I still have to carry the fares or I don't eat.

It also means that most of my interaction with support personnel and tool developers is about what's wrong with their tool, not what's right with it, which is unfortunate. SoftQuad, back when they were an independent company, the makers of what was Author/Editor and what became XMetal (now owned by Blast Radius) had a person specifically assigned to handle my issue reports (it wasn't their entire job, of course, but it did take a good bit of their time from time to time). I try to be constructive when I submit issue reports against tools, providing sample data, clear descriptions of how the problem occurred, what the customer requirements were, and so on, but I'm sure that doesn't take the sting out of it.

So to all those tool developers that I have pounded with issue reports and feature requests, know that just being in a position to get those reports means, in most cases, that your software is sufficiently superior or compelling to warrant my attention at all. That is, I only spend time reporting problems with software that doesn't suck nearly as much. [Most tools I evaluate fail to meet my requirements within the first 5 minutes of my using them. This makes tool evaluation easier but doesn't speak well of software engineers generally.] Cold comfort I know, but it's just not practical to send the occasional "hey, your software hasn't crashed or impeded my integration and extension attempts at all for a whole month--good job guys, keep it up!". Not going to happen, at least not normally.

Maybe I can help to change this situation a little bit here by singling out for praise those tools that I do in fact find to not be particularly sucky, as I've already done. Of course, this can't just be a love fest....

My First Blogroll

My thanks to Bill Trippe, whose mention of this blog is the first I found when I googled "eliot kimber blog" this morning. Bill's blog looks pretty interesting. I don't think I've ever met Bill, but I meet a lot of people and I am truly horrible with names so if we have in fact met and I've just forgotten it, I appologize. I haven't added Bill's blog to the list of other blogs here because I haven't seen enough of Bill's opinions to know if I respect them yet. However, he is a T.S. Eliot fan and a writer and by the tone of the posts I did read a thoughful and intelligent dude, so I suspect that I will in fact find his opinions quite respectible. I've subscribed to his blog. And of course I have linked to it here so there is at least one more critical back link.

Bill also has AdSense-style adds on his blog and I noticed that, presumably as a side effect of a long entry he wrote about his experiences with T-Mobile, that several ads for mobile phone service showed up, although none for T-Mobile, which would have pleased me more. As it happens I've been a T-Mobile customer for quite a while (or rather, my employer has been and they pay for my phone) but I've been perfectly happy with the service, although they put somebody else's outgoing message on my account a couple of weeks ago. I love computers.

And here's a technical angle to this: how hard can it be to correlate a recorded message with a phone's account in the T-Mobile master system? This is such a basic aspect of data management that I marvel that this error occurred at all. Which just further reinforces my assertion that "all tools suck". I mean really.

Thursday, February 09, 2006

Some Tools That Suck Significantly Less

Having ragged on tools in general, here's a list of a few tools that, by my analysis and experience, I can certify as "minimally sucky":

- Antenna House XSL Formatter. An XSL-FO implementation that implements essentially every useful feature of the XSL-FO recommendation with a high degree of correctness, provides needed and useful extensions, has very high engineering quality, excellent support, pretty good API, decent documentation, good platform coverage, and it does Thai (which until recently was distinguishing).

- RenderX XEP. An XSL-FO implementation that is a very close second to XSL Formatter in all respects. It's just a little less complete and correct in its feature implementation but has very high performance and comparable support and documentation quality. And now XEP does Thai too, making things a little more interesting in the XSL FO implementation world.

To my mind, these two tools exemplify how standards-based software tools should be: they compete competently, vigorously, and fairly on value, providing tools that will serve their users well at a reasonable cost with minimal proprietary lock-in. They participate constructively in the standards process and generally make me happy.

In my many years of experience few tools have given me more pleasure and caused me less pain than these two products. Their existence has made it possible for me and my professional collegues to achieve remarkable success in creating sophisticated, affordable, sustainable publishing systems using XSL-FO.

My genunine thanks go out to the teams at RenderX and Antenna House.

Another tool that doesn't really suck at all is the Saxon XSLT engine from Mike Kay. Saxon is remarkable in being software of the highest engineering quality that supports essentially 100% of the relevant standards and is backed by exceptional support, especially given that it's just one man doing it and it's free, open-source software. I don't think it's overstating things to say that Mike Kay is a god among men and it's not within my power to meaningfully repay him for the value that Saxon has provided me personally. It's the only XSLT engine I use both for feature reasons (it's the only implementation that provides the collator extension support I need) and for quality reasons: it is as close to a bug-free piece of non-trivial software as I've ever worked with, and its fast. Wicked fast.

Thanks Mike.

There are a few other tools, maybe I'll mention them at some point. The fact that I haven't mentioned your tool doesn't mean that you don't suck less, but I'm hard pressed to think of any other tool that I depend on day-to-day for my XML-related work that has the same level of completeness and quality as these three pieces of software.

Arbortext Editor (nee Epic Editor nee Adept Editor) is also at the top of the list: it's a solid tool that implements XML and related useful specifications with remarkable completeness. It has good integration features and documentation. Support is usually pretty good. It's a powerful tool that can solve a lot of problems. It's level of suck is pretty low, but it does not quite achieve the level of excellence of the foregoing. It does crash occasionally (but almost never loses data or at least not a significant amount of data), it is a little spendy (reflective of its value but still spendy relative to its competition and what people want to spend).

Maybe next I'll discuss why all XML content management systems are, without exception, heinous piles of crap that should be avoided at all costs....

All Tools Suck

Or: why do I hate everything?

The motto of this blog (and my professional life) is "all tools suck (some suck less than others)"

That's a pretty harsh statement. After all there's lots of useful software out there (and a lot more heineous piles of crap, but that's just the human condition).

So what do I really mean by "all tools suck"?

A better question might be "what makes a tool a good tool?" The trite answer is "if it meets requirements" but in fact this is the only really useful answer--the key is defining your requirments clearly. If one of your requirements is "doesn't crash all the time" then you've just eliminated a huge swath of tools. If another one is "doesn't take proprietary ownership of my data" then you've eliminated a whole other swath.

My problem, and in fact the problem of anyone who has chosen to use standards for their data in order to get the value that standards provide because they are standard (and not, for example, simply because that's the format the tool you like happens to use), is that standards are explicitly statements of requirement and generally very demanding statements of requirement.

For example, the XSL Formatting Objects (XSL-FO) recommendation, the standard I've worked most closely with the last few years, defines a very sophisticated mechanism for doing automatic composition of printable pages. As such it defines lots of requirements for what XSL-FO should and should not do.

The rub is that for any non-trivial standard few tools will completely satisfy the requirements inherent in that standard, whether it's failure to correctly implement a required feature or just not implementing an optional, but useful, feature.

Therefore, any tool that doesn't fully implement the standard at hand inherently sucks, by definition, because it doesn't meet requirements.

Of course, a given user of the standard may not require all the things the standard requires (that's why most standards have optional features), in which case, any tool that implements those features the user does require and otherwise meets the user's requirements (reliability, performance, cost, etc.) doesn't suck for that user.

But I'm an integrator. That means the set of requirements I care about is the union of all the requirements of my clients and prospects which, since I don't always know what my clients' and prospects' requirements are, is easiest to define as "everything the standard does that isn't prima facia useless".

Plus I make a point of trying to explore the practical boundaries of standards and their supporting technologies, so I tend to try to do things that aren't workaday requirements (but that would still be useful in real use cases).

As most engineers implementing tools are focused either on the requirements they understand or what the marketing department tells them is important or the features they can quickly implement, they tend not to focus on the edge cases. This is just how software development works. It's the very rare engineer who has the time, inclination, and luxury of implementing a standard completely just for the sake of completeness. In fact I'm not sure it's ever happened, at least not within the XML family of standards (the Saxon family of XSLT implementations may be the exception here--Mike Kay is one wicked mofo when it comes to standards implementation and software engineering).

So this means that my starting point when coming to a new tool that purports to support some set of standards is that it fails to support some set of requirements that I have (because it doesn't implement the whole standard [because no tool ever supports the whole standard]). So right off it's at a serious disadvantage. Then it still has to satisfy all the other requirements that any piece of useful software has to satisfy: cost, performance, correct operation, ease of use, ease of integration, etc. These requirements, by themselves are hard enough for most software to satisfy (because most software just isn't engineered that well so, on average, most tools you encounter will be pretty weak, just by the law of averages).

So to review:

Given that:

- By definition, all tools will fail to meet all the requirements inherent in the standards they claim to support to one degree or another, and

- Most tools are buggy, slow, bloated examples of poor software engineering

It is clear that:

- All tools suck, to one degree or another

The degree to which a tool doesn't suck is then a function of two factors:

- The number of requirements in the standard it does support and the value of those requirements to the likely users of the tools (implementing parts of standards that nobody wants or uses or should use doesn't reduce your suckiness score). [For example, the XSL-FO specification includes a bunch of features for aural presentation of rendered documents. These features are essentially useless for print applications so few, if any, FO implementations support them. That does not count against those implementations because the features are not useful to most users of XSL-FO. However, failing to support a very useful feature like block-containers does count against you.]

- The overall software engineering quality with regard to performance, buginess, value (price relative to features provided), support, documentation, ease of integration, and so on.

For most purposes I give these two factors roughly equal weight, although for most work I give engineering quality somewhat greater weight assuming that the critical features of the standard are otherwise implemented. But sometimes you can tolerate a slower or buggyier or more bloated tool because it implements more critical features.

Finally, as an integrator I don't care just about the raw functionality of the tool but about its features that support integration, such as APIs, command-line options, packaging as plug-ins, platform support, documentation for APIs, and so on. Many tools that are otherwise quite strong often fall down here because this is stuff that relatively few users of the tool care about. So it tends to get to no love (I'm talking to you, Documentum).

So on top of the usual frustrations with poor, incomplete, and incorrect implementation of standards and typically buggy and poorly-supported programs, add my frustration with trying to integrate these tools with other similarly joyful tools and you can see that my job is a recipe for bitterness and pain.

Oh yeah, and one more thing: I am freakin' Eliot Kimber! I've been doing this more years than most of you snot nosed kids with your IDEs and your AJAX hacks and your iPods have been alive so don't be telling me that my requirements are somehow less compelling than what you've figured out by reading XML for Dummies! Listen to me: implement the features I want or your software will be forever cursed! You have been warned!

Now do you understand why all tools suck?

Getting Started: Who the Hell are You and Why Should I Care What You Think?

I am starting this blog for a number of reasons. First, I need an outlet for my technical thoughts that go beyond the narrow scope of my day-to-day job. Second, I wanted to experiment with this form of communication (blogging) in a technical, rather than strictly personal, context. Finally, I'm curious about how the AdSense feature works and whether or not it is useful, moral, effective, ineffective, and so on.

So to start, who the heck am I and why do I think I have any business blogging about any subject, much less XML and stuff?

I am W. Eliot Kimber, AKA "Dr. Macro." I've been doing markup-based writing and publishing for over 25 years now. I was an early user of SGML and a long-time member of the SGML and XML standards community. I was an early user of the original HyTime standard (ISO/IEC 10744:1992) and out of that started working with Dr. Charles Goldfarb and Dr. Steve Newcomb as a co-editor of the HyTime standard and member of the SGML standard committee. Out of that work I was asked to be a founding member of what became the XML Work Group at the W3C.

In my day job I started as a technical writer at IBM's Research Triangle Park facility and migrated into a tools support role, developing tools to support the use of GML for the development of large, complex technical manuals (and where I got the nickname "Dr. Macro" for my facility with xedit macros [the IBM mainframe editor, not the PC editor of the same name. Kedit is the PC version of the mainframe xedit]. Out of that work I was asked, along with Don Day (still of IBM and late of DITA fame) and Wayne Wohler (still at IBM as far as I know), to develop the forward-looking SGML follow-on to IBM's GML application, BookMaster. Before that I worked with Wayne as IBM's representative to the industry group that eventually became the progenitors of DocBook (I forget the name of that committee, but it involved a number of people who would later have significant input to the XML standard). I used to write a lot (I mean a lot) in the various public fora related to SGML and XML technologies (see self-serving google link below) but the last few years I've been too tired and too busy with interesting project work to post quite so much, pretty much confining my public posts to factual stuff related XSL-FO. I used to present papers at the various SGML and XML industry conferences but not so much any more (especially now that I'm a father).

I left IBM in 1994 to work as a systems integration consultant for a small company called Passage Systems (now long dead). I've been doing essentially that job ever since, since 1996 at ISOGEN in its various incarnations. In that role I've designed a variety of systems for managing, authoring, and publishing SGML and XML to print, HTML, online help, even help files for Symbian mobile phones. I've developed DTDs and schemas, written heinous Perl scripts (death before Perl), hacked FrameMaker, beaten my head against any number piece-of-crap content management tools, learned new standards, new programming languages, designed and implemented a content management system that wasn't quite such a piece of crap (but that got tied up in an intractible intellectual property tangle so the code still sits on a shelf, unusable by anybody), abandoned technologies that seemed to have such promise (I'm talking to you, DSSSL), met and worked with no end of amazing people (I'm talking to you Truly Donovan and James Clark and Norm Walsh and Jon Bosak and Tim Bray and Charles Goldfarb and Steve Newcomb and Peter Newcomb and Steve DeRose and Jeff Suttor and Joe Alfonso and Marcy Thompson and John Heintz and on and on (cue orchestra as announcer intros commercial break and the next award)...

Most of my career has been, in one way or another, as a developer and integrator of tools that support document authoring, management, and publishing using the available standards of the day (de-jure, je-facto, or corporate). This experience has been rewarding and painful. Out of the rewards I think I've gained some useful wisdom that I'd like to capture in the hope that it might be of value to others.

Out of the pain I've developed a deep vitriolic frustration and bitterness that can only find expression in outbursts of barely-controlled ranting.

This blog will reflect both the pain and the pleasure, the agony and the ecstacy. I've made more mistakes than most and had my share of successes. I've learned a lot of hard lessons. And I'm just arrogant enough to think that my opinion has serious weight (or maybe more than arrogant enough--it's a tough quality to measure accurately). I've learned how to test software well and I think I've learned how to do good software engineering. I've learned, mostly, how not to totally alienate my clients and prospects. I've learned to bite my tounge when I really wanted to say something pointed. I've learned that the smile of a little girl at the end of the day is way more important than any line of code that might or might not get written.

With this background you might reasonably ask: why do you keep trying? Why don't you just admit defeat and get a comfortable 9-5 job as a Java hack? Damn good question. Part of the answer is that I am too much of an idealist to let it go. I still feel that there is some hope for standards in a world filled with venal software vendors and I just can't stand back and let people's data get trapped in the proprietary spike pits that have historically been the norm. I'm also cheap and I just refuse to pay (or let my clients pay) huge bank for features that should not be so expensive.

So I keep trying.

My current standards-related activities include:

- Active member of the XSL Formatting Objects subgroup of the XSL Working Group (W3C)

- Active member of the DITA Technical Committee (OASIS Open)

For the last four or five years my day-to-day work has focused almost exclusively on building XSL-FO-based publishing systems for various clients, mostly in the consumer electronics area. For the last year I've been working on a new Innodata Isogen product offering that attempts to take generalized composition of XML to a new level of generality.

Here's what can you expect from this blog:

  • More or less random posts about what I think good practice is

  • Rants about tools, technologies, and practices that I think are wrong, dangerous, misguided, evil or that otherwise need ranting about

  • Praise about tools, technologies and practices that I think are right, of high value, and worthy of praise or otherwise need talking up

  • The occasional random thought with possibly only tangential relation to the technical subject at hand


What you will not find in this blog:

  • Rambling musings about my feelings, what I ate today, the music I'm listening to, or any of the other tedious personal crap that clogs so many other blogs. For that you can go to my family blog: Woods-Kimber New Family Adventure

  • Predictable frequency of posts. I've long since stopped pretending that I can be consistent about this sort of thing. So bite me.

  • Craven objectivity. For that you need to hire me as a consultant, at which point I am obligated by my employer's relationship to its partners to be fair and objective to a fault when it comes to various tools. While I would never intentionally trash a product or tool just because I'm being a dick that day, I do sometimes reveal uncomfortable truths that may be inconsistent with the marketing message of the supplier or considered opinions of the engineers responsible. Deal with it.

So what else is it useful to know about me?

So enough about me, let's get to a rant...