Tuesday, May 3, 2011

Regex for Bash-style Variables, Concluded

The last regex handled double-quoted strings, but not the variable name and and equals sign that precede them. I could add to the regex to specifically match the name and equals sign, but I know my earlier regex which handled single quotes also matched the variable name and equals sign. Ultimately, I want one regex which matches all forms of Bash variable setting, so I'll combine the two.

I've been careful up till now to appropriately quote my regex for use at the command line. Now that both single and double quotes will be present, I'll switch to using a file for my script.

$ cat bashvars
#!/usr/bin/perl -n

print "$&\n" if m/("(\\"|[^"])*"|'[^']*'|[^#\n])*/
$ cat in
FOO=42 # answer to the question
BAR='easter bunny #2' # hippity hoppity
BAZ="\"DON'T PANIC\" in large, friendly letters"
$ ./bashvars in
FOO=42
BAR='easter bunny #2'
BAZ="\"DON'T PANIC\" in large, friendly letters"

If the input is limited to just lines that set variables, the above script works, but if the input is, say, a whole Bash script, it quickly becomes apparent that more than just variables are matched. I will (finally) add to the regex to insist that it match a variable name and equals sign. I'll also add a semicolon to the most generic character class to cover those times when a variable setting is followed by code on the same line.

$ cat bashvars2
#!/usr/bin/perl -n

$v = qr/("(\\"|[^"])*"|'[^']*'|[^#;\n])*/;
$kvp = qr/^\s*([_a-zA-Z]\w*=$v)/;
print "$1\n" if $_ =~ $kvp;