The Java DOM API sucks

How it sucks

I recently wrote some code to filter nodes from a DOM and had issues with NullPointerExceptions:

NodeList nl = doc.getElementsByTagName("mytag");
for(int i = 0, n = nl.getLength(); i < n; i++) {
	Element elt = (Element) nl.item(i);
	if(...) {
		elt.getParentNode().removeChild(elt);
	}
}

The problem was that NodeLists are live, and removing elements while iterating changes the indexes. When I tried to retrieve an element past the end, nl.item(i) returned null. Even changing the loop to this wouldn’t fix it: for(int i = 0; i < nl.getLength(); i++). The problem then is that if element i is removed, the element that was at index i+1 is now at index i and won’t be checked. I ended up using this code:

NodeList nl = doc.getElementsByTagName("mytag");
for(int i = 0, n = nl.getLength(); i < n; i++) {
	Element elt = (Element) nl.item(i);
	if(...) {
		elt.getParentNode().removeChild(elt);
		i--;
		n--;
	}
}

The bug above is just another example of why iterating by index is (usually) a bad idea. If the API used List<Node> instead of NodeList, this and many other problems would be solved. Unfortunately, NodeList is hardly the only place where the DOM API falls short. Look at the code commonly used to dump an XML document to a file:

Document doc = ...;
OutputStream out = new FileOutputStream(...);
TransformerFactory.newInstance().newTransformer().transform(new DOMSource(doc), new StreamResult(out));

You have to create a transformer factory (with a static method no less, evidently constructors aren’t good enough), create a default transformer, create a Source and a Result, and pass these to the transformer. The more standard option is probably to use DOMImplementationLS:

Document doc = ...;
OutputStream out = new FileOutputStream(...);
DOMImplementationLS ls = (DOMImplementationLS) doc.getImplementation();
LSOutput lso = ls.createLSOutput();
lso.setByteStream(out);
ls.createLSSerializer().write(d, lso);

I’m not even certain that this code is valid. The documentation on DOMImplementationLS is beyond cryptic:

The expectation is that an instance of the DOMImplementationLS interface can be obtained by using binding-specific casting methods on an instance of the DOMImplementation interface or, if the Document supports the feature “Core” version “3.0” defined in [DOM Level 3 Core], by using the method DOMImplementation.getFeature with parameter values “LS” (or “LS-Async”) and “3.0” (respectively).

Seriously? What would be wrong with:

Document doc = ...;
OutputStream out = new FileOutputStream(...);
new DOMWriter().write(doc, out);

I understand that namespaces, formatting, etc. need to be customizable, which is why I allow for a DOMWriter class, but there’s no need to involve factories, static factory methods for creating factories, transformations, or various implementations. Further, why does it seem that I need level 3 of a feature to serialize a document? Parsing and serializing should have been the very first thing implemented in the very first version of any DOM API.There’s also the fact that if you use a new version of Java and an old version of Tomcat, XPathFactory.newInstance() will fail with the exception: java.lang.RuntimeException: XPathFactory#newInstance() failed to create an XPathFactory for the default object model: http://java.sun.com/jaxp/xpath/dom with the XPathFactoryConfigurationException: javax.xml.xpath.XPathFactoryConfigurationException: No XPathFctory implementation found for the object model: http://java.sun.com/jaxp/xpath/dom . The funny/ironic/annoying part is that the Javadoc for that method states “this method will never fail.” This is caused in part by the fact that the DOM API allows multiple implementations to be used. The implementations have to be statically registered which causes no end of headaches with the security API and the image API. The designers should have learned by now that this idea does not work.

The examples don’t stop there, but I think I’ve made my point.

Why it sucks

A large part of the reason for the API’s moronic design is the W3C DOM spec. I generally like the W3C and its work, but the DOM spec should not even exist. Its fatal flaw is that it is designed for multiple programming languages(including Java and ECMAScript, commonly referred to as JavaScript, as well as potentially C and C++). There is too large of a mismatch between the languages for such a large API to be successful. ECMAScript and C do not support namespaces. Java does not support fields (attributes) in interfaces. Java and ECMAScript do not support manual memory management, whereas C requires it. Java and C++ use class-based inheritence, while ECMAScript uses prototype-based inheritence, and C does not support OO at all. More to the point, you can’t even write to the console using the same code in each language, so why bother creating a common DOM API?

Even given the specification, the language bindings are worse than necessary. DOMString is bound to java.lang.String, so obviously the names do not need to match exactly. DOMTimeStamp is bound to long in Java and Date in ECMAScript, so obviously bindings don’t even need the same interface. Since the specification provides flexibility, why not bind NodeList to List<Node>? It would eliminate the necessity to iterate by index, you would automatically have the Java API available to shuffle, sort, extract sublists, etc.

How to fix it

Unfortunately, the best thing would have been to have never created the specification and leave it to individual language specs. At this point, since it’s already a specification for Java and ECMAScript, the closest thing would be to deprecate the old API in each language and develop a new one from scratch. However, that’s not feasible either. It would be very inefficient but technically possible to do so in Java. In ECMAScript, though, due to its lack of namespaces, it would be impossible without using names like Node2, Document2, etc.

I propose that we deprecate the specification itself and allow the languages to evolve independently. In Java specifically, we should deprecate the most unreasonable parts of the API and have people that know what they’re doing design replacements.

One Comment

  1. Posted January 3, 2013 at 9:10 pm | Permalink

    Yes, it does really suck. I personally refer to SAX as the Shitty Api for XML. I would be perfectly fine with it being complicated of I could always get an attribute value out of it. I have to process lots’ of svg’s at my job and I find myself resorting to regex’s to hack the values I want out of the document so I can finish my work. The Java SAX and DOM are the worst Api’s in any language I have used.

Post a Comment

Your email is never shared. Required fields are marked *

*
*