A JScript/VBScript Regex Lookahead Bug
Here's one of the oddest and most significant regex bugs in Internet Explorer. It can appear when using optional elision within lookahead (e.g., via ?, *, {0,n}, or (.|); but not +, interval quantifiers starting from one or higher, or alternation without a zero-length option). An example in JavaScript:
/(?=a?b)ab/.test("ab"); // Should return true, but IE 5.5 – 8b1 return false /(?=a?b)ab/.test("abc"); // Correctly returns true (even in IE), although the // added "c" does not take part in the match
I've been aware of this bug for a couple years, thanks to a blog post by Michael Ash that describes the bug with a password-complexity regex. However, the bug description there is incomplete and subtly incorrect, as shown by the above, reduced test case. To be honest, although the errant behavior is predictable, it's a bit tricky to describe because I haven't yet figured out exactly what's happening internally. I'd recommend playing with variations of the above code to get a better understanding of the problem.
Fortunately, since the bug is predictable, it's usually possible to work around. For example, you can avoid the bug with the password regex in Michael's post (/^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{8,15}$/) by writing it as /^(?=.{8,15}$)(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).*/ (the .{8,15}$ lookahead must come first here). The important thing is to be aware of the issue, because it can easily introduce latent and difficult to diagnose bugs into your code. Just remember that it shows up with variable-length lookahead. If you're using such patterns, test the hell out of them in IE.


Comment by Jan Goyvaerts on 26 March 2008:
My guess is that after matching a?b, the regex engine notices it has reached the end of the data, and concludes that the following tokens can no longer match. The conclusion is incorrect because the zero-width nature of the lookahead resets the position the regex is at in the string.
Quantifiers that make their token optional require special handling. E.g. given the regex (a?)* you don’t want the * to forever repeat the case where ? repeats zero times.
This matches correctly:
/(?=x?ab)ab/.test(“ab”);
This matches, but captures “a” into \1 instead of “ab”:
/(?=(ab?))ab/.test(“ab”);
So it seems that the false end-of-data test only occurs when there are further backtracking positions to attempt. In the first of the two above, all permutations x? have already been exhausted by the time the “ab” in the lookahead matches.
Comment by Hananiel Sarella on 7 May 2008:
The suggested code /^(?=.{8,15}$)(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).*/
does not work for Asp.Net RegEx Validators because the underlying javascript does this:
var rx = new RegExp(val.validationexpression);
var matches = rx.exec(value);
return (matches != null && value == matches[0])
Since all of these are look ahead groups none of them capture. I found that by using “(?=^.{6,25}$)(?=.*[a-z])(?=.*[A-Z]).*” the problem is solved because while the look aheads enforce the length, at least one small, at least one Caps the .* then captures the entire text.
Thanks for your help with the regex.
Comment by Steve on 7 May 2008:
@Jan Goyvaerts: Very interesting observations.
@Hananiel Sarella: I think you’re misunderstanding something. There’s a
.*at the end of the regex I posted, too. It should work just fine.Comment by Hananiel Sarella on 8 May 2008:
I never paid attention to it I guess. I was debugging this problem for a while and I found and understood what yours was doing, I didnt pay attention to the ‘.*’ . Finally after stepping through the frame work javas script I realized what was going on and came up with the solution you already posted. figures! Thanks for your good work, saved me a lot of trouble.
Comment by See Wah on 31 July 2008:
Thanks! I came across this exact bug today. You are a lifesaver!
Comment by Martin on 13 October 2008:
Thanks – you’ve saved me sometime having to come up with an alternative (non RegEx) solution to this. Well done!
Found it also worked where the . is replaced by say [A-Za-z0-9]
E.g.
/^(?=.{8,15}$)(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).*/
becomes
/^(?=[A-Za-z0-9]{8,15}$)(?=[A-Za-z0-9]*\d)(?=[A-Za-z0-9]*[a-z])(?=[A-Za-z0-9]*[A-Z])[A-Za-z0-9]*/
Comment by Terry on 10 November 2008:
Trying some tests on: http://www.w3schools.com/jsref/tryit.asp?filename=tryjsref_exec_regexp
Just pasted the following within the script tags:
var str = “415ggWy6″;
var patt = new RegExp(“^(?=.{8,15}$)(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).*”);
var result = patt.exec(str) + “: Valid? ” + patt.test(str);
document.write(result + “”);
document.write(“Input: ” + str + “”);
document.write(“RegEx: ” + patt);
And my result was:
null: Valid? false
Input: 415ggWy6
RegEx: /^(?=.{8,15}$)(?=.*d)(?=.*[a-z])(?=.*[A-Z]).*/
Am I missing something? Shouldn’t this be returning true?
Thanks in advance. Glad someone out there likes RegEx’s
Pingback by Michael Ash's Regex Blog : Looking again at the Lookahead bug on 21 February 2009:
[...] Levithan looked much closer at the problem in general and discussed it on his blog. He came to the conclusion that the qualifiers with a minimum boundary of zero, within the [...]
Comment by Craig on 9 June 2009:
Not sure how relevant this is, but came across an odd issue in IE just now. My validation expression ^(?=.*\d)[a-zA-Z0-9]{6,20}$ worked in FF 3 (didn’t test other browsers except IE 7 where it kept *not* matching “valid” entries.
I changed the curly bracket part specifying the range to (?=.{6,20}) and it worked. Weird? or not? It always worked in FF 3 whichever way but IE only accepted the second way… Don’t have time right now to test other browsers but will revisit this later to test in other browsers as well – unless someone here has the time
Comment by Craig on 9 June 2009:
ok, my fix didn;t work as it now allows non alphanumeric chars!!!! I’m not sure how something so presumably simple is beginning to turn in an absolute nightmare!!!!!! validate password cross browser simple ? You’d think !
Pingback by Regular expressions and the ASP.NET RegularExpressionValidator control – an overview of useful links on 10 September 2009:
[...] JScript/VBScript bug that is also present in Internet Explorer (almost any version): A JScript/VBScript Regex Lookahead Bug [...]
Comment by mays on 17 December 2009:
@Craig : that i guess is because of .* wildcard in (?=.*\d)
Try this
/(?=^[a-zA-Z0-9]{6,20}$)(?=.+\d).+/