Friday, April 29, 2011

Handling Double Quotes in Regex for Bash-Style Variables

Expanding on Sunday's post, I want to add double quote handling to the regex for Bash-style variables.  This can be done similarly to the single quote handling, with a couple of wrinkles.  A double-quoted string may contain escaped quotes, which do not terminate the string, but instead cause literal double quote characters to be included in the string.  The escape character is the backslash. A double-quoted string may also contain single quotes, which are interpreted literally.
$ BAZ="\"DON'T PANIC\" in large, friendly letters"
echo $BAZ
"DON'T PANIC" in large, friendly letters
The double quote analog of the single quote regex looks like "[^"]*", but that won't handle the escaped quotes. A regex of \\" will, but only once. To combine these regex, I use the same approach as before, moving the * from the first regex to the combined regex, and weighting the regex to prefer \\". The combined regex is "(\\"|[^"])*".

$ cat in
BAZ="\"DON'T PANIC\" in large, friendly letters"
$ perl -ne 'print "$&\n" if m/"(\\"|[^"])*"/' in
"\"DON'T PANIC\" in large, friendly letters"

Notice, this doesn't match the key portion of the variable line. I'll address that in the next post.

No comments:

Post a Comment