Excited by the fact that I can mimic atomic groups when using most regex libraries which don't support them, I set my sights on another of my most wanted features which is commonly lacking: conditionals (which provide an if-then-else construct). Of the regex libraries I'm familiar with, conditionals are only supported by .NET, Perl, PCRE (and hence, PHP's preg functions), and JGsoft products (including RegexBuddy).
There are two common types of regex conditionals in those libraries: capturing-group-based and lookaround-based. I'll get to the latter type in a bit, but first I'll address capturing-group-based conditionals, which are able to base logic on whether a capturing group has participated in the match so far. Here's an example:
That matches only "bd" and "abc". The pattern can be expressed as follows:
Here's a comparable pattern I created which doesn't require support for conditionals:
Note that to use it without an "else" part, you still need to include the second empty backreference (in this case,
\3) at the end, like this:
As a brief explanation of how that works, there's a zero-length alternation option within the lookahead at the beginning which is used to cancel the effect of the lookahead, while at the same time, the intentionally empty capturing groups within the lookahead are exploited to base the then/else part on which option in the lookahead matched. However, there are a couple issues:
- This doesn't work with some regex engines, due to how they handle backreferences for non-participating capturing groups.
- It interacts with backtracking differently than a real conditional (the "a" part is treated as if it were within an optional, atomic group, e.g.,
(a)?), so it might be better to think of this as a new operator which is similar to a conditional.
Here are the regex engines I've briefly tested this pattern with:
|Language||Supports fake cond.||Supports real cond.||Notes|
|.NET||Yes||Yes||Tested using Expresso.|
|ColdFusion||Yes||No||Tested using ColdFusion MX7.|
|Java||Yes||No||Tested using Regular Expression Test Applet.|
|JGsoft||Yes||Yes||(Edit:) Works as of RegexBuddy version 2.4.0. Previous versions contained two bugs (which I reported to JGsoft) which prevented this from working reliably.|
As for lookaround-based conditionals, we can mimic them using the same concepts. Here's what a real lookaround-based conditional looks like (this example uses a positive lookahead for the assertion):
And here's how you can mimic it:
Again, to use it without an "else" part, you still need to include the second empty backreference (in this case,
\2) at the end, like this:
- The above compatibility table applies just the same.
- Backtracking does not come into play with lookaround-based conditionals in the same way as with capturing-group-based conditionals. As a result, mimicked lookaround-based conditionals are functionally identical to their "real" counterparts.
- In some regex flavors, it may be necessary to write it in the the somewhat less lucid form
- Another, potentially more verbose and less efficient way to mimic a lookaround-based conditional is to alternate two opposite lookarounds. E.g.,
7 thoughts on “Mimicking Conditionals”
I had no idea that was possible! As no one had left a comment and because it was such a great post I thought I should say something. If you ever get a chance to expand this with more examples that would be cool. Thanks.
I agree that better examples would be helpful. Hopefully I’ll get back to this eventually.
Looks like a filtered down version of this page:
Um, no. That page, which I linked to in the first paragraph of this post, describes how to use “real” regex conditionals, which are supported by far fewer regex engines than are able to take advantage of my construct for mimicking them.
Very cool and very helpful. I hate regex but when you need it you need it. This solved my problem – thank you!
Hello, I’m trying (?=(a)()|())\1?b(?:\2c|\3d) in both regex101.com and also java. but it is not working as expected. abc and abd is being matched.any idea ?