Las Vegas: Underfoot - news and reviews
BookReader - An ePub viewer for Mac OS X

Word count in XML documents

I recently had the need to calculate the number of words in an XML-based book I was writing. Because of all the markup, this isn't as simple as it might seem. I eventually found this article:
Tip: Computing word count in XML documents.

The technique works well, although the word count isn't entirely precise. It's close enough for my purposes, though, and here's the command line I use to calculate a single word count across the entire book (which is broken into many separate DITA source files):

$ xsltproc --novalid *.dita |wc -w

The XSL file referenced is a copy of the one from the article, I just posted it to a web server so that I have easy access to it from wherever I'm working. XSLTPROC is pre-installed on Mac OS X 10.6 Snow Leopard.


The comments to this entry are closed.