Friday, June 17, 2011

A Rant: Comments in Regex

I've followed a tenet for a long time that comments in code should explain why, not how. This tenet is founded on years of debugging code. When a programmer writes a comment explaining how something works, they are actually saying how they think it should work, which is rarely how it actually works. These type of comments are actually harmful in that they can prejudice the reader's understanding and hide flaws which might otherwise be visible. Since I can't really know what's in a comment until I read it, I try to read the code first without reading the comments, to form my own idea of its workings.

Comments in regular expressions typically say how something matches, not why. When a regular expression is not matching as expected, these sort of comments are worse than useless, tainting expectations of the code at hand. Given the terseness and complexity of large regular expressions, it hardly matters whether the author of the expression and comments is separate from the reader or one and the same. When looking at my old code, I regularly have to stop and think about a regex, but I believe that it is time well spent. Every time I find a bug, it reinforces that programming is a human endeavor, and as such is never perfect. Recently, I used one of my old scripts as an example in the SELF 2011 talk. This is a script I've used for years, and as I was looking at it on the slide, I saw where I had put s,^./,, when I meant s,^\./,,.

Abstraction is a tool for managing complexity, and it can be used with regular expressions too. Here's an example from a ruby class:

numrange = /\d+(?:-\d+)?/
numlist = /#{numrange}(?:,#{numrange})*/
step = /\/\d+/
numspec = /(?:\*|#{numlist})(?:#{step})?/

This code tries to be self-documenting, so the intent is explicit in the choice of names. This is akin to a comment, but is simplistic enough to be instantly validated. Since more complex patterns are built using prior abstractions, each can be understood and validated with little effort.

Monday, June 13, 2011

My fond regards to PHP

Advanced regex with PHP

Some love for the php world in this one... and a reminder to myself that a rant on comments in regex is overdue.

Sunday, June 12, 2011

Slides from Southeast LinuxFest 2011 talk

Regular Expression Practice UPDATE: Here is a video of the talk and the link:

Lazy day links

The talk at SELF 2011 is done - Whew! I'll get around to posting the slides soon, but for now, I just wanna be lazy and post a link or two.

"Crucial Concepts Behind Advanced Regular Expressions" offers an interesting mix of concepts, ranging from everyday usage (greedy/non-greedy, and word boundaries) to the esoteric (atomic groups, callbacks, and recursion). I don't agree that the concepts are all crucial, but I thought it was a good read.

Humor: Parsing HTML with Regex - funny!!