Write-Only Code
Regular Expressions - pushing the limits of comprehensibility ;)
Saturday, June 8, 2013
Tuesday, June 19, 2012
Regular Expressions that Cheat
Recently, I was writing some expressions to normalize a log file, and found myself "cheating." The question then is "When is it OK to cheat?" I think it's OK to cheat when you know the input well, and know that it is well formed. This might be the case, for example, if I'm parsing the output of a program that I wrote. Then I should have a pretty good idea of all the possible outputs of the program.
As it happens, the log file I was normalizing was not very well known to me. I went back and rewrote several expressions. Here's one of the rewritten expressions:
^20\d{2}-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01]) ([01][0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9],\d{5}
This was to match a timestamp with microseconds. There's a similar regex to match IPv4 addresses in the SELF slides.
Anyone feel differently about cheating - feel free to comment.
Monday, June 11, 2012
SELF 2012 Retrospective
The ride home to Georgia on the motorcycle had some exciting moments too. I got rained on a couple of times, but nothing so bad as to make me want to stop and put on my rain gear. I did notice I'd lost one of the bolts holding on my windshield. I occasionally had these little panicky thoughts - what if it flies off and hits me while I'm going 75 mph down the highway. Still, if I'd wanted a completely boring trip, I'd have taken a greyhound bus.
Just some review of things I said, but that weren't in the slides:
- + is like * but means 1 or more repetitions
- {N} means exactly N repetitions
- {N,} means N or more repetitions
- {,N} means 0 up to N repetitions
- {N,M} means from N to M repetitions
Depending on the flavor or regular expressions you're dealing with, you may need to put a \ in front of the curly braces to use the above behavior.
I'd love to hear people's thoughts on the conference, or any questions. Thanks for sharing.
Thursday, June 7, 2012
Friday, November 11, 2011
Quick Tips - grep ps output
Instead of this tired old command line trope:
ps -ef | grep foo | grep -v grep
try this:
ps -ef | grep '[f]oo'
This works because the regex [v]im contains a character class matching the single letter v. This will match the string vim in the ps output, but will not match the grep, i.e. the character class [v] does not match the string [v].$ ps -ef | grep vimdg 14823 14739 0 15:12 pts/0 00:00:00 grep vimdg 21295 13905 0 Nov09 pts/0 00:00:01 vim -R main.c$ ps -ef | grep vim | grep -v grepdg 21295 13905 0 Nov09 pts/0 00:00:01 vim -R main.c$ ps -ef | grep '[v]im'dg 21295 13905 0 Nov09 pts/0 00:00:01 vim -R main.c
Friday, June 17, 2011
A Rant: Comments in Regex
Comments in regular expressions typically say how something matches, not why. When a regular expression is not matching as expected, these sort of comments are worse than useless, tainting expectations of the code at hand. Given the terseness and complexity of large regular expressions, it hardly matters whether the author of the expression and comments is separate from the reader or one and the same. When looking at my old code, I regularly have to stop and think about a regex, but I believe that it is time well spent. Every time I find a bug, it reinforces that programming is a human endeavor, and as such is never perfect. Recently, I used one of my old scripts as an example in the SELF 2011 talk. This is a script I've used for years, and as I was looking at it on the slide, I saw where I had put s,^./,, when I meant s,^\./,,.
Abstraction is a tool for managing complexity, and it can be used with regular expressions too. Here's an example from a ruby class:
numrange = /\d+(?:-\d+)?/
numlist = /#{numrange}(?:,#{numrange})*/
step = /\/\d+/
numspec = /(?:\*|#{numlist})(?:#{step})?/
This code tries to be self-documenting, so the intent is explicit in the choice of names. This is akin to a comment, but is simplistic enough to be instantly validated. Since more complex patterns are built using prior abstractions, each can be understood and validated with little effort.
Monday, June 13, 2011
My fond regards to PHP
Some love for the php world in this one... and a reminder to myself that a rant on comments in regex is overdue.