I near-religiously use non-capturing groups whenever I do not need to reference a group's contents. Recently, several people have asked me why this is, so here are the reasons:
- Capturing groups negatively impact performance. The performance hit may be tiny, especially when working with small strings, but it's there.
- When you need to use several groupings in a single regex, only some of which you plan to reference later, it's convenient to have the backreferences you want to use numbered sequentially. E.g., the logic in my parseUri UDF could not be nearly as simple if I had not made appropriate use of capturing and non-capturing groups within the same regex.
- They might be slightly harder to read, but ultimately, non-capturing groups are less confusing and easier to maintain, especially for others working with your code. If I modify a regex and it contains capturing groups, I have to worry about if they're referenced anywhere outside of the regex itself, and what exactly they're expected to contain.
Of course, some capturing groups are necessary. There are three scenarios which meet this description:
- You're using parts of a match to construct a replacement string, or otherwise referencing parts of the match in code outside the regex.
- You need to reuse parts of the match within the regex itself. E.g.,
(["'])(?:\\\1|.)*?\1would match values enclosed in either double or single quotes, while requiring that the same quote type start and end the match, and allowing inner, escaped quotes of the same type as the enclosure. (Update: For details about this pattern, see Regexes in Depth: Advanced Quoted String Matching.)
- You need to test if a group has participated in the match so far, as the condition to evaluate within a conditional. E.g.,
(a)?b(?(1)c|d)only matches the values "bd" and "abc".
If a grouping doesn't meet one of the above conditions, there's no need to capture.