Mimicking Conditionals

Excited by the fact that I can mimic atomic groups when using most regex libraries which don't support them, I set my sights on another of my most wanted features which is commonly lacking: conditionals (which provide an if-then-else construct). Of the regex libraries I'm familiar with, conditionals are only supported by .NET, Perl, PCRE (and hence, PHP's preg functions), and JGsoft products (including RegexBuddy).

There are two common types of regex conditionals in those libraries: capturing-group-based and lookaround-based. I'll get to the latter type in a bit, but first I'll address capturing-group-based conditionals, which are able to base logic on whether a capturing group has participated in the match so far. Here's an example:

(a)?b(?(1)c|d)

That matches only "bd" and "abc". The pattern can be expressed as follows:

(if_matched)?inner_pattern(?(1)then|else)

Here's a comparable pattern I created which doesn't require support for conditionals:

(?=(a)()|())\1?b(?:\2c|\3d)

Note that to use it without an "else" part, you still need to include the second empty backreference (in this case, \3) at the end, like this:

(?=(a)()|())\1?b(?:\2c|\3)

As a brief explanation of how that works, there's a zero-length alternation option within the lookahead at the beginning which is used to cancel the effect of the lookahead, while at the same time, the intentionally empty capturing groups within the lookahead are exploited to base the then/else part on which option in the lookahead matched. However, there are a couple issues:

  • This doesn't work with some regex engines, due to how they handle backreferences for non-participating capturing groups.
  • It interacts with backtracking differently than a real conditional (the "a" part is treated as if it were within an optional, atomic group, e.g., (?>(a))? instead of (a)?), so it might be better to think of this as a new operator which is similar to a conditional.

Here are the regex engines I've briefly tested this pattern with:

Language Supports fake cond. Supports real cond. Notes
.NET Yes Yes Tested using Expresso.
ColdFusion Yes No Tested using ColdFusion MX7.
Java Yes No Tested using Regular Expression Test Applet.
JavaScript No No According to ECMA-262v3, backreferences to non-participating capturing groups always succeed, and most browsers respect that. Unfortunately, this pattern depends on the way most other regex engines handle such groups.
JGsoft Yes Yes (Edit:) Works as of RegexBuddy version 2.4.0. Previous versions contained two bugs (which I reported to JGsoft) which prevented this from working reliably.

As for lookaround-based conditionals, we can mimic them using the same concepts. Here's what a real lookaround-based conditional looks like (this example uses a positive lookahead for the assertion):

(?(?=if_assertion)then|else)

And here's how you can mimic it:

(?:(?=if_assertion()|())\1then|\2else)

Again, to use it without an "else" part, you still need to include the second empty backreference (in this case, \2) at the end, like this:

(?:(?=if_assertion()|())\1then|\2)

Notes:

  • The above compatibility table applies just the same.
  • Backtracking does not come into play with lookaround-based conditionals in the same way as with capturing-group-based conditionals. As a result, mimicked lookaround-based conditionals are functionally identical to their "real" counterparts.
  • In some regex flavors, it may be necessary to write it in the the somewhat less lucid form (?=a()|())(?:\1b|\2c).
  • Another, potentially more verbose and less efficient way to mimic a lookaround-based conditional is to alternate two opposite lookarounds. E.g., (?=if_assertion)then|(?!if_assertion)else. That will work even in the case of flavors like JavaScript where backreferences to non-participating groups match the empty string.

7 thoughts on “Mimicking Conditionals”

  1. I had no idea that was possible! As no one had left a comment and because it was such a great post I thought I should say something. If you ever get a chance to expand this with more examples that would be cool. Thanks.

  2. Thanks, Gary!

    I agree that better examples would be helpful. Hopefully I’ll get back to this eventually.

  3. Um, no. That page, which I linked to in the first paragraph of this post, describes how to use “real” regex conditionals, which are supported by far fewer regex engines than are able to take advantage of my construct for mimicking them.

  4. Very cool and very helpful. I hate regex but when you need it you need it. This solved my problem – thank you!

  5. Hello, I’m trying (?=(a)()|())\1?b(?:\2c|\3d) in both regex101.com and also java. but it is not working as expected. abc and abd is being matched.any idea ?

Leave a Reply

Your email address will not be published. Required fields are marked *