Thursday 17 March 2016

Pattern matching in sublime text with regex

An exercise in Yak shaving

Recently I had a very large xml file where i needed to do some string manipulation and replacements, sublime text is always my go to editor for this type of thing.
Here is an example snippet.
<data name="NumberToWord_1" xml:space="preserve">
  <value>first</value>
</data>
<data name="NumberToWord_2" xml:space="preserve">
  <value>second-value</value>
</data>
<data name="NumberToWord_3" xml:space="preserve">
  <value>3rd text value</value>
</data>


I needed to select the names and then paste them into the values.

Firstly we need to select the name with one of the following regex:
(?<=<data name=").+?(?=")
(?<=<data name=")[^"]+
<data name="\K[^"]+
^.*?"\K\w+

The first uses look behind, look around and a non greedy selector, maybe not the easiest to understand. I evolved this into the second regex by doing away with the lazy selector and the look around at the end. the third is basically the second rewritten with the meta-character \K to reset the start point of the regex (keep). Finally i trimmed away the data by looking for the first double quote on the line, resetting the keep and after that only selecting word characters.

once i had all the name strings highlighted i can use sublime texts multiple copy feature to put the 100+ words into the clipboard.

Next to paste the values back. So we need to select the value
(?<=<value>).+(?=<value>)
(?<=<value>)[^<]+
<value>\K[^<]+
^.+?e>\K[^<]+

The first line again uses a look behind and a look around with a lazy select all. The second example replaces the look around with a more defined character selector of negative <. The third replaces the look behind with a Keep reset character. finally just for fun i selected the first e> in the text with a lazy selector.

So I've now got all the values selected, just paste the current multi clipboard over the top and we have done.

Alternatives

Now there are many ways to skin a cat, i really wanted to play with regex today and so this was a nice exercise to practice with more advanced regex. But i know not everyone likes or gets regex, another approach would be to use the cursor. First select all lines with a data element (multi-select), go home and ctrl+right click till you have the cursor at just after the first quote. Then ctrl+shift+right to select the word, then copy them all. Press the down arrow to get focus to the value element, end and ctrl+left to get the cursor to just inside the end of the element value, ctrl+k, ctrl+space to set a mark, home and ctrl+left to get the cursor just inside the start of the element value, ctrl+k, ctrl+a to select to the mark and then paste. Or for this last step use the ctrl+shirt+a to use sublimes widen selection which works in xml documents.
There are so many variations on this cursor based approach that i cant list them all here, suffice to say sublime text is very powerful at text manipulation, learn your tooling people, and enjoy.

Want to see this in action watch this video:

Recorded on KRUT (an open source screen recorder) with keyjedi (to show the keyboard shortcuts).

If you think there is a simpler (or more clever :-) approach, please leave a comment and share. How bald is your Yak now?


No comments:

Post a Comment