Recently, I was writing some expressions to normalize a log file, and found myself "cheating." The question then is "When is it OK to cheat?" I think it's OK to cheat when you know the input well, and know that it is well formed. This might be the case, for example, if I'm parsing the output of a program that I wrote. Then I should have a pretty good idea of all the possible outputs of the program.
As it happens, the log file I was normalizing was not very well known to me. I went back and rewrote several expressions. Here's one of the rewritten expressions:
^20\d{2}-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01]) ([01][0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9],\d{5}
This was to match a timestamp with microseconds. There's a similar regex to match IPv4 addresses in the SELF slides.
Anyone feel differently about cheating - feel free to comment.