A JScript/VBScript Regex Lookahead Bug

Here's one of the oddest and most significant regex bugs in Internet Explorer. It can appear when using optional elision within lookahead (e.g., via ?, *, {0,n}, or (.|); but not +, interval quantifiers starting from one or higher, or alternation without a zero-length option). An example in JavaScript:

/(?=a?b)ab/.test("ab");
// Should return true, but IE 5.5 – 8b1 return false

/(?=a?b)ab/.test("abc");
// Correctly returns true (even in IE), although the
// added "c" does not take part in the match

I've been aware of this bug for a couple years, thanks to a blog post by Michael Ash that describes the bug with a password-complexity regex. However, the bug description there is incomplete and subtly incorrect, as shown by the above, reduced test case. To be honest, although the errant behavior is predictable, it's a bit tricky to describe because I haven't yet figured out exactly what's happening internally. I'd recommend playing with variations of the above code to get a better understanding of the problem.

Fortunately, since the bug is predictable, it's usually possible to work around. For example, you can avoid the bug with the password regex in Michael's post (/^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{8,15}$/) by writing it as /^(?=.{8,15}$)(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).*/ (the .{8,15}$ lookahead must come first here). The important thing is to be aware of the issue, because it can easily introduce latent and difficult to diagnose bugs into your code. Just remember that it shows up with variable-length lookahead. If you're using such patterns, test the hell out of them in IE.

25 thoughts on “A JScript/VBScript Regex Lookahead Bug”

  1. My guess is that after matching a?b, the regex engine notices it has reached the end of the data, and concludes that the following tokens can no longer match. The conclusion is incorrect because the zero-width nature of the lookahead resets the position the regex is at in the string.

    Quantifiers that make their token optional require special handling. E.g. given the regex (a?)* you don’t want the * to forever repeat the case where ? repeats zero times.

    This matches correctly:
    /(?=x?ab)ab/.test(“ab”);
    This matches, but captures “a” into \1 instead of “ab”:
    /(?=(ab?))ab/.test(“ab”);

    So it seems that the false end-of-data test only occurs when there are further backtracking positions to attempt. In the first of the two above, all permutations x? have already been exhausted by the time the “ab” in the lookahead matches.

  2. The suggested code /^(?=.{8,15}$)(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).*/
    does not work for Asp.Net RegEx Validators because the underlying javascript does this:

    var rx = new RegExp(val.validationexpression);
    var matches = rx.exec(value);
    return (matches != null && value == matches[0])

    Since all of these are look ahead groups none of them capture. I found that by using “(?=^.{6,25}$)(?=.*[a-z])(?=.*[A-Z]).*” the problem is solved because while the look aheads enforce the length, at least one small, at least one Caps the .* then captures the entire text.

    Thanks for your help with the regex.

  3. @Jan Goyvaerts: Excellent observations.

    @Hananiel Sarella: I think you’re misunderstanding something. There’s a .* at the end of the regex I posted, too. It should work just fine.

  4. I never paid attention to it I guess. I was debugging this problem for a while and I found and understood what yours was doing, I didnt pay attention to the ‘.*’ . Finally after stepping through the frame work javas script I realized what was going on and came up with the solution you already posted. figures! Thanks for your good work, saved me a lot of trouble.

  5. Thanks – you’ve saved me sometime having to come up with an alternative (non RegEx) solution to this. Well done!

    Found it also worked where the . is replaced by say [A-Za-z0-9]

    E.g.

    /^(?=.{8,15}$)(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).*/

    becomes

    /^(?=[A-Za-z0-9]{8,15}$)(?=[A-Za-z0-9]*\d)(?=[A-Za-z0-9]*[a-z])(?=[A-Za-z0-9]*[A-Z])[A-Za-z0-9]*/

  6. Trying some tests on: http://www.w3schools.com/jsref/tryit.asp?filename=tryjsref_exec_regexp

    Just pasted the following within the script tags:

    var str = “415ggWy6”;
    var patt = new RegExp(“^(?=.{8,15}$)(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).*”);
    var result = patt.exec(str) + “: Valid? ” + patt.test(str);
    document.write(result + “”);
    document.write(“Input: ” + str + “”);
    document.write(“RegEx: ” + patt);

    And my result was:

    null: Valid? false
    Input: 415ggWy6
    RegEx: /^(?=.{8,15}$)(?=.*d)(?=.*[a-z])(?=.*[A-Z]).*/

    Am I missing something? Shouldn’t this be returning true?

    Thanks in advance. Glad someone out there likes RegEx’s 😉

  7. Not sure how relevant this is, but came across an odd issue in IE just now. My validation expression ^(?=.*\d)[a-zA-Z0-9]{6,20}$ worked in FF 3 (didn’t test other browsers except IE 7 where it kept *not* matching “valid” entries.
    I changed the curly bracket part specifying the range to (?=.{6,20}) and it worked. Weird? or not? It always worked in FF 3 whichever way but IE only accepted the second way… Don’t have time right now to test other browsers but will revisit this later to test in other browsers as well – unless someone here has the time 🙂

  8. ok, my fix didn;t work as it now allows non alphanumeric chars!!!! I’m not sure how something so presumably simple is beginning to turn in an absolute nightmare!!!!!! validate password cross browser simple ? You’d think !

  9. Works perfectly. I was having issues with IE 6 and 7, but not IE 8 or FF 3, etc. After applying your fix, they all work as expected! Excellent.

  10. Did anyone actually file a bug at Microsoft Connect for this? And does this still exist after all the years of servicepacks and updates?

  11. It looks like the capturing is the problem. My solution:

    follow every lookahead (except the last one, if it is the ending of the regex) with:
    .*?

    If your regex ends with a lookahead, follow it with
    .*

    IE7 appears to want to end the validation with a capture.

  12. I need a help regarding regular expression for strong password. conditions are

    Password must contain at least 8 characters including at least 2 numbers

    Must Allowed :
    1abcdef2
    21abcdef
    abcdef22
    1abcd4ef
    1a$bcdef
    1abcd^f2
    12345678
    1234567d
    1234567%
    1234567%33434
    1234567%ddd
    1234567dd%

    Not Allowed
    abcdef5%

    it has to work in IE 7 ,IE8

    i have a regulare Expression (ValidationExpression=”(?=(?:.*?\d){2})(?=(?:.*?[A-Za-z@#$%^&+=]){2}).{8,}” )
    but it is not working in IE7

  13. Hi,

    I am using following regex to check to permit only Alphanumeric characters including underscore, period, hyphen and space.

    var reg = /^(?:[a-zA-Z_0-9\.]+ ?)*$/;

    It works fine in IE8 and latest Mozila but not working in IE7.

    Any thoughts?
    Thanks in advance.

  14. i’m having an issue with email validation with this bug if anyone can help.

    ^\w+([\.-]?\w+)*@\w+([\.-]?\w+)*(\.\w{2,3})+$

    works on every modern browser including IE8 but returns as invalid email in IE7 or below.

  15. Disregard my post the regex was working fine it was the fact that 2 input boxes shared the same name attribute so the validation script was failing.

Leave a Reply

Your email address will not be published. Required fields are marked *