I finally decided to look deeper into the XPath performance problem I previously blogged about. I found some offending source code in the JDK source code (XPathImpl.eval(String, Object))and later came across bug #6344064 filed in October 2005, confirming the same behaviour I experienced and pointing to the same lines of code.
In using Xalan to implement JAXP in Java 1.5, Sun is hiding Xalan’s concept of XPathContext, creating a new one for each XPath evaluation. XPathContext contains Xalan’s enumerated representation of the DOM, so by using a new one for each XPath evaluation, Xalan has to enumerate each element in the DOM again. Enumerating the DOM elements is quite fast, but if you have:
- a large DOM
- XPath expressions that each search only on a small part of the DOM
- many XPath expressions
then the performance difference between naively applying the XPath expressions using the entire DOM v. breaking up the DOM into smaller subtrees and applying XPath expressions in a more targeted fashion, is quite noticeable.