A JScript/VBScript Regex Lookahead Bug
Here's one of the oddest and most significant regex bugs in Internet Explorer. It can appear when using optional elision within lookahead (e.g., via ?
, *
, {0,n}
, or (.|)
; but not +
, interval quantifiers starting from one or higher, or alternation without a zero-length option). An example in JavaScript:
/(?=a?b)ab/.test("ab"); // Should return true, but IE 5.5 – 8b1 return false /(?=a?b)ab/.test("abc"); // Correctly returns true (even in IE), although the // added "c" does not take part in the match
I've been aware of this bug for a couple years, thanks to a blog post by Michael Ash that describes the bug with a password-complexity regex. However, the bug description there is incomplete and subtly incorrect, as shown by the above, reduced test case. To be honest, although the errant behavior is predictable, it's a bit tricky to describe because I haven't yet figured out exactly what's happening internally. I'd recommend playing with variations of the above code to get a better understanding of the problem.
Fortunately, since the bug is predictable, it's usually possible to work around. For example, you can avoid the bug with the password regex in Michael's post (/^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{8,15}$/
) by writing it as /^(?=.{8,15}$)(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).*/
(the .{8,15}$
lookahead must come first here). The important thing is to be aware of the issue, because it can easily introduce latent and difficult to diagnose bugs into your code. Just remember that it shows up with variable-length lookahead. If you're using such patterns, test the hell out of them in IE.
Comment by Jan Goyvaerts on 26 March 2008:
My guess is that after matching a?b, the regex engine notices it has reached the end of the data, and concludes that the following tokens can no longer match. The conclusion is incorrect because the zero-width nature of the lookahead resets the position the regex is at in the string.
Quantifiers that make their token optional require special handling. E.g. given the regex (a?)* you don’t want the * to forever repeat the case where ? repeats zero times.
This matches correctly:
/(?=x?ab)ab/.test(“ab”);
This matches, but captures “a” into \1 instead of “ab”:
/(?=(ab?))ab/.test(“ab”);
So it seems that the false end-of-data test only occurs when there are further backtracking positions to attempt. In the first of the two above, all permutations x? have already been exhausted by the time the “ab” in the lookahead matches.
Comment by Hananiel Sarella on 7 May 2008:
The suggested code /^(?=.{8,15}$)(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).*/
does not work for Asp.Net RegEx Validators because the underlying javascript does this:
var rx = new RegExp(val.validationexpression);
var matches = rx.exec(value);
return (matches != null && value == matches[0])
Since all of these are look ahead groups none of them capture. I found that by using “(?=^.{6,25}$)(?=.*[a-z])(?=.*[A-Z]).*” the problem is solved because while the look aheads enforce the length, at least one small, at least one Caps the .* then captures the entire text.
Thanks for your help with the regex.
Comment by Steve on 7 May 2008:
@Jan Goyvaerts: Excellent observations.
@Hananiel Sarella: I think you’re misunderstanding something. There’s a
.*
at the end of the regex I posted, too. It should work just fine.Comment by Hananiel Sarella on 8 May 2008:
I never paid attention to it I guess. I was debugging this problem for a while and I found and understood what yours was doing, I didnt pay attention to the ‘.*’ . Finally after stepping through the frame work javas script I realized what was going on and came up with the solution you already posted. figures! Thanks for your good work, saved me a lot of trouble.
Comment by See Wah on 31 July 2008:
Thanks! I came across this exact bug today. You are a lifesaver!
Comment by Martin on 13 October 2008:
Thanks – you’ve saved me sometime having to come up with an alternative (non RegEx) solution to this. Well done!
Found it also worked where the . is replaced by say [A-Za-z0-9]
E.g.
/^(?=.{8,15}$)(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).*/
becomes
/^(?=[A-Za-z0-9]{8,15}$)(?=[A-Za-z0-9]*\d)(?=[A-Za-z0-9]*[a-z])(?=[A-Za-z0-9]*[A-Z])[A-Za-z0-9]*/
Comment by Terry on 10 November 2008:
Trying some tests on: http://www.w3schools.com/jsref/tryit.asp?filename=tryjsref_exec_regexp
Just pasted the following within the script tags:
var str = “415ggWy6”;
var patt = new RegExp(“^(?=.{8,15}$)(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).*”);
var result = patt.exec(str) + “: Valid? ” + patt.test(str);
document.write(result + “”);
document.write(“Input: ” + str + “”);
document.write(“RegEx: ” + patt);
And my result was:
null: Valid? false
Input: 415ggWy6
RegEx: /^(?=.{8,15}$)(?=.*d)(?=.*[a-z])(?=.*[A-Z]).*/
Am I missing something? Shouldn’t this be returning true?
Thanks in advance. Glad someone out there likes RegEx’s 😉
Pingback by Michael Ash's Regex Blog : Looking again at the Lookahead bug on 21 February 2009:
[…] Levithan looked much closer at the problem in general and discussed it on his blog. He came to the conclusion that the qualifiers with a minimum boundary of zero, within the […]
Comment by Craig on 9 June 2009:
Not sure how relevant this is, but came across an odd issue in IE just now. My validation expression ^(?=.*\d)[a-zA-Z0-9]{6,20}$ worked in FF 3 (didn’t test other browsers except IE 7 where it kept *not* matching “valid” entries.
I changed the curly bracket part specifying the range to (?=.{6,20}) and it worked. Weird? or not? It always worked in FF 3 whichever way but IE only accepted the second way… Don’t have time right now to test other browsers but will revisit this later to test in other browsers as well – unless someone here has the time 🙂
Comment by Craig on 9 June 2009:
ok, my fix didn;t work as it now allows non alphanumeric chars!!!! I’m not sure how something so presumably simple is beginning to turn in an absolute nightmare!!!!!! validate password cross browser simple ? You’d think !
Pingback by Regular expressions and the ASP.NET RegularExpressionValidator control – an overview of useful links on 10 September 2009:
[…] JScript/VBScript bug that is also present in Internet Explorer (almost any version): A JScript/VBScript Regex Lookahead Bug […]
Comment by mays on 17 December 2009:
@Craig : that i guess is because of .* wildcard in (?=.*\d)
Try this
/(?=^[a-zA-Z0-9]{6,20}$)(?=.+\d).+/
Comment by Matthew on 2 June 2010:
Works perfectly. I was having issues with IE 6 and 7, but not IE 8 or FF 3, etc. After applying your fix, they all work as expected! Excellent.
Pingback by A JScript/VBScript Regex Lookahead Bug « DotNet Strings on 6 September 2010:
[…] to a blog post by Steve that describes the bug with a password-complexity regex. However, the bug description there is […]
Comment by Jesse on 21 September 2010:
Did anyone actually file a bug at Microsoft Connect for this? And does this still exist after all the years of servicepacks and updates?
Comment by B. David Holt on 30 September 2010:
It looks like the capturing is the problem. My solution:
follow every lookahead (except the last one, if it is the ending of the regex) with:
.*?
If your regex ends with a lookahead, follow it with
.*
IE7 appears to want to end the validation with a capture.
Comment by Praveen on 7 April 2011:
I need a help regarding regular expression for strong password. conditions are
Password must contain at least 8 characters including at least 2 numbers
Must Allowed :
1abcdef2
21abcdef
abcdef22
1abcd4ef
1a$bcdef
1abcd^f2
12345678
1234567d
1234567%
1234567%33434
1234567%ddd
1234567dd%
Not Allowed
abcdef5%
it has to work in IE 7 ,IE8
i have a regulare Expression (ValidationExpression=”(?=(?:.*?\d){2})(?=(?:.*?[A-Za-z@#$%^&+=]){2}).{8,}” )
but it is not working in IE7
Pingback by RegEx problem with IE7 when trying validate Email address on 7 October 2011:
[…] Now this works allright with IE8 –> and latest Mozilla and Opera version for example. I already read about this article: http://blog.stevenlevithan.com/archives/regex-lookahead-bug […]
Comment by sujata on 27 March 2012:
Hi,
I am using following regex to check to permit only Alphanumeric characters including underscore, period, hyphen and space.
var reg = /^(?:[a-zA-Z_0-9\.]+ ?)*$/;
It works fine in IE8 and latest Mozila but not working in IE7.
Any thoughts?
Thanks in advance.
Pingback by Hacking lookahead to mimic intersection, subtraction and negation | Lea Verou on 13 May 2012:
[…] /^(?=.*d)(?=.*[a-z])(?=.*[W_]).{6,}$/i. Note that if you want to support IE8, you have to take this bug into account and modify the pattern […]
Comment by Laner on 13 July 2012:
i’m having an issue with email validation with this bug if anyone can help.
^\w+([\.-]?\w+)*@\w+([\.-]?\w+)*(\.\w{2,3})+$
works on every modern browser including IE8 but returns as invalid email in IE7 or below.
Comment by Laner on 17 July 2012:
Disregard my post the regex was working fine it was the fact that 2 input boxes shared the same name attribute so the validation script was failing.
Comment by Make Home Improvement on 9 July 2014:
Hi, i read your blog occasionally and i own a similar one and i was just curious if you get a lot
of spam comments? If so how do you stop it, any plugin or anything you can recommend?
I get so much lately it’s driving me crazy so any support is
very much appreciated.
Pingback by regex - Expresión regular para asegurarse de que la cadena contiene al menos una minúscula char, mayúsculas char, dÃgitos y sÃmbolos on 27 August 2019:
[…] Hay una realidad confusa IE/JScript error: blog.stevenlevithan.com/archives/regex-lookahead-bug […]
Pingback by regex - Espressione regolare per assicurarsi che la stringa contenga almeno un minuscolo char, il maiuscolo char, cifre e simboli on 15 September 2019:
[…] C’è davvero fonte di confusione IE/JScript bug: blog.stevenlevithan.com/archives/regex-lookahead-bug […]
Comment by PSC Result on 3 November 2019:
Get PSC Result 2019
Pingback by ASP.NET Regular Expression Validator (Password Strength) – inneka.com on 30 January 2020:
[…] Edit4: More reading: http://blog.stevenlevithan.com/archives/regex-lookahead-bug […]
Comment by roku.com/link on 11 February 2020:
I’m amazed, I must say. Seldom do I come across a blog that’s both equally educative and engaging, and let me tell you, you’ve hit the nail on the head. The problem is an issue that too few folks are speaking intelligently about. Now i’m very happy I came across this during my hunt for something regarding this.
Comment by roku.com/link on 11 February 2020:
Can I just say what a relief to discover somebody that really knows what they’re discussing online. You definitely understand how to bring a problem to light and make it important. More people should check this out and understand this side of your story. I can’t believe you are not more popular given that you definitely possess the gift.