Friday, 22 July 2011

The perils of word counts in LaTeX

I'm currently writing up my thesis. I am writing it in LaTeX, which for the uninitiated means that rather than writing it in Word or similar, I write it in a text file and include lots of formatting commands around the thesis to say `this bit should be emphasised' or `this bit is a subsection', then LaTeX produces a pretty pdf file from my text file.

Like any self-respecting numbers girl, I like to keep a watch on how many words I have written that day, and in total. Although I don't have a word limit to adhere to, hitting a good word count for a day makes me feel like I've done good work, even if a lot of it gets cut later...

For a while now I've been dubious about the word counter I've been using: the inbuilt Statistics word counter in TeXShop, which works using detex | wc -w , or in plain English, it strips the thesis file of all the formatting commands and then counts the number of words left. According to this word counter, I have 61138 words, but I've noticed that this total tends to fluctuate: there have been days where I've written a lot and have ended up with less words than I started with.

Trying some alternatives out:

  • texcount *.tex: 60239 in text + 1457 in headers + 2469 in captions = 64165
  • ps2ascii thesis.pdf | wc -w: 83415 (but this includes the bibliography which is currently 7686 words and any appendices text) 
  • copying and pasting the text into OpenOffice: 75222
  • copying and pasting the text into Word: 74664
  • copying and pasting the text into TextWrangler (a Mac text editor): 74093

So I've anything between 61000 and 83000 words. TeXShop's statistics is instant but inaccurate. Texcount takes about 30 seconds to process then a quick calculation has to be done. ps2ascii also takes a little while and doesn't separate the thesis text from the bibliography/appendices. Copying and pasting, sadly, looks the most accurate (though now I have no idea how many words I've actually got...)

What I mostly want is a tool that measures progress very quickly, for some motivation - so I'll probably stick with TeXShop's Statistics or use texcount, after all of that!

As my housemate says, however,
it doesn't really matter if its 60 or 80 or 100,000 words, just finish it!

Right then: enough procrastination, back to the writing...
PS Must mention the excellent LaTeX tutorials written by Andrew Roberts.


  1. I'm just amazed you managed to get Word and OpenOffice to yield the same result!

  2. Ah, actually I didn't, I'd closed the OpenOffice window by the time I wrote this blog and couldn't remember what the count was. Let's update...

  3. Ah, that's more like it! Haha! I'm constantly annoyed by that. Why can't they just agree a standard cross-platform word-boundary regex definition for word processors to use?

  4. I guess if you're looking for an easy way to remove 1% or so from your Word document word count total then OpenOffice comes along to save the day!