Write-Only Code: A Rant: Comments in Regex

I've followed a tenet for a long time that comments in code should explain why, not how. This tenet is founded on years of debugging code. When a programmer writes a comment explaining how something works, they are actually saying how they think it should work, which is rarely how it actually works. These type of comments are actually harmful in that they can prejudice the reader's understanding and hide flaws which might otherwise be visible. Since I can't really know what's in a comment until I read it, I try to read the code first without reading the comments, to form my own idea of its workings.

Comments in regular expressions typically say how something matches, not why. When a regular expression is not matching as expected, these sort of comments are worse than useless, tainting expectations of the code at hand. Given the terseness and complexity of large regular expressions, it hardly matters whether the author of the expression and comments is separate from the reader or one and the same. When looking at my old code, I regularly have to stop and think about a regex, but I believe that it is time well spent. Every time I find a bug, it reinforces that programming is a human endeavor, and as such is never perfect. Recently, I used one of my old scripts as an example in the SELF 2011 talk. This is a script I've used for years, and as I was looking at it on the slide, I saw where I had put s,^./,, when I meant s,^\./,,.

Abstraction is a tool for managing complexity, and it can be used with regular expressions too. Here's an example from a ruby class:

numrange = /\d+(?:-\d+)?/
numlist = /#{numrange}(?:,#{numrange})*/
step = /\/\d+/
numspec = /(?:\*|#{numlist})(?:#{step})?/

This code tries to be self-documenting, so the intent is explicit in the choice of names. This is akin to a comment, but is simplistic enough to be instantly validated. Since more complex patterns are built using prior abstractions, each can be understood and validated with little effort.

Write-Only Code

Friday, June 17, 2011

A Rant: Comments in Regex

No comments:

Post a Comment