Flagrant Badassery

A JavaScript and regular expression centric blog

Regex Performance Optimization

Crafting efficient regular expressions is somewhat of an art. In large part, it centers around controlling/minimizing backtracking and the number of steps it takes the regex engine to match or fail, but the fact that most engines implement different sets of internal optimizations (which can either make certain operations faster, or avoid work by performing simplified pre-checks or skipping unnecessary operations, etc.) also makes the topic dependent on the particular regex flavor and implementation you're using to a significant extent. Most developers aren't deeply aware of regex performance issues, so when they run into problems with regular expressions running slowly, their first thought is to remove the regexes. In reality, most non-trivial regexes I've seen could be significantly optimized for efficiency, and I've frequently seen dramatic improvements as a result of doing so.

The best discussion of regex optimization I've seen is chapter six (a good 60 pages) of Mastering Regular Expressions, Third Edition by Jeffrey Friedl. Unfortunately, other good material on the subject is fairly scarce, so I was pleasantly surprised to stumble upon the recent article Optimizing regular expressions in Java by Cristian Mocanu. Of course, it is in part Java specific, but for the most part it is a good article on the basics of optimization for traditional NFA regex engines. Check it out.

Have you seen any good articles or discussions about regular expression performance or efficiency optimization recently? Do you have any questions about the subject? Experience or pointers to share? Let me know. (I hope to eventually write up an in-depth article on JavaScript regex optimization, with lots of tips, techniques, and cross-browser benchmarks.)

There Are 4 Responses So Far. »

  1. I’ve got an awesome tip for you. Did you know you can use ?: to avoid capturing a backreference when one isn’t needed? LOL!

    I’m just kidding. Obviously you know plenty enough as it is, but thanks to your elite regex wizardry (which I abbreviate to regexardry) I know enough about regular expressions to yell at the n00bs at work.

  2. Hi Steve,
    Great blog, just wondering if you could help.. I have the regular expression for the bit i want to disregard, how do i keep the bit i want?

    All i want to do, is strip a URL of the anchor..
    So i had var regexp = /\/#/;

    but that’s the bit i want to throw away! how do i keep the other bit? surely it must be easily possible like in PHP

    thanks in advance :)

    great blog btw..

    jon

  3. Hi Jon, thanks!

    Just replace the text you match with an empty string, and what’s left will be the part you want to keep, e.g.:

    str = str.replace(/#.*/, "");

    Or in this case you could also use str = str.match(/^[^#]*/)[0];

    However, for the case you described you don’t really need to use a regex. Try this instead:

    str = str.substring(0, str.indexOf("#"));

  4. great thanks a lot! Friday afternoon here in UK, would explain why i didn’t see that one..

    have a good weekend.
    j

Post a Response

If you are about to post code, please escape your HTML entities (&, >, <).