A JScript/VBScript Regex Lookahead Bug

Here's one of the oddest and most significant regex bugs in Internet Explorer. It can appear when using optional elision within lookahead (e.g., via ?, *, {0,n}, or (.|); but not +, interval quantifiers starting from one or higher, or alternation without a zero-length option). An example in JavaScript:

/(?=a?b)ab/.test("ab");
// Should return true, but IE 5.5 – 8b1 return false

/(?=a?b)ab/.test("abc");
// Correctly returns true (even in IE), although the
// added "c" does not take part in the match

I've been aware of this bug for a couple years, thanks to a blog post by Michael Ash that describes the bug with a password-complexity regex. However, the bug description there is incomplete and subtly incorrect, as shown by the above, reduced test case. To be honest, although the errant behavior is predictable, it's a bit tricky to describe because I haven't yet figured out exactly what's happening internally. I'd recommend playing with variations of the above code to get a better understanding of the problem.

Fortunately, since the bug is predictable, it's usually possible to work around. For example, you can avoid the bug with the password regex in Michael's post (/^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{8,15}$/) by writing it as /^(?=.{8,15}$)(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).*/ (the .{8,15}$ lookahead must come first here). The important thing is to be aware of the issue, because it can easily introduce latent and difficult to diagnose bugs into your code. Just remember that it shows up with variable-length lookahead. If you're using such patterns, test the hell out of them in IE.