Update: This version of XRegExp is outdated. See XRegExp.com for the latest, greatest version.
I use regular expressions in JavaScript fairly frequently, and although the exec() method is badass and I love the ability to use a function to generate the replacement in the replace() method, JavaScript regexes lack some very significant features available in many other languages. Amongst my biggest pet peeves is the lack of support for s
and x
flags, which should enable "dot matches all" (a.k.a, single-line) mode and "free-spacing and comments" mode, respectively. These modifiers are available in almost every other modern regex library.
To remedy this, I've created a (very small) script which extends the JavaScript RegExp
constructor, enabling the aforementioned flags. Basically, it gives you a constructor called XRegExp
which works exactly like the RegExp
constructor except that it also accepts s
and x
as flags, in addition to the already supported g
(global), i
(case insensitive), and m
(multiline, i.e., ^
and $
match at line breaks). As a bonus, XRegExp
also improves cross-browser regex syntax consistency.
Here's how it works:
var regex = new XRegExp("te?", "gi");
var value = "Test".replace(regex, "z");
Look familiar? If you've used regular expressions in JavaScript before, it should — it's exactly like using RegExp
. However, so far we haven't done anything that can't be accomplished using the native RegExp
constructor. The s
flag is pretty self-explanatory (specific details can be found in the FAQ, below), so here's an example with the x
flag:
var email = new XRegExp(
"\\b " +
"# Capture the address to $1 \n" +
"( " +
" \\w[-.\\w]* # username \n" +
" @ " +
" [-.a-z0-9]+\\.(?:com|net) # hostname \n" +
") " +
"\\b ", "gix");
value = value.replace(email, "<a href=\"mailto:$1\">$1</a>");
That's certainly different! A couple notes:
- When using
XRegExp
, the normal string escape rules (preceding special characters with "\") are necessary, just as with RegExp
. Hence, the three instances of \n
are metasequences within the string literal itself. JavaScript converts them to newline characters (which end the comments) before XRegExp
sees the string.
- The email regex is overly simplistic and only intended for demonstrative purposes.
That's fairly nifty, but we can make this even easier. If you run the following line of code:
XRE.overrideNative();
…Like magic, the RegExp
constructor itself will support the s
and x
flags from that point forward. The tradeoff is that you will then no longer be able to access information about the last match as properties of the global RegExp
object. However, those properties are all officially deprecated anyway, and you can access all the same info though a combination of properties on regex instances and use of the exec()
method.
Here's a quick FAQ. For the first two questions, I've borrowed portions of the explanations from O'Reilly's Mastering Regular Expressions, 3rd Edition.
- What exactly does the s flag do?
- Usually, dot does not match a newline. However, a mode in which dot matches a newline can be as useful as one where dot doesn't. The
s
flag allows the mode to be selected on a per-regex basis. Note that dots within character classes (e.g., [.a-z]
) are always equivalent to literal dots.
As for what exactly is considered a newline character (and therefore not matched by dots outside of character classes unless using the s
flag), according to the Mozilla Developer Center it includes the four characters matched by the following regex: [\n\r\u2028\u2029]
- What exactly does the x flag do?
- First, it causes most whitespace to be ignored, so you can "free-format" the expression for readability. Secondly, it allows comments with a leading #.
Specifically, it turns most whitespace into an "ignore me" metacharacter, and # into an "ignore me, and everything else up to the next newline" metacharacter. They aren't taken as metacharacters within a character class (which means that classes are not free-format, even with x
), and as with other metacharacters, you can escape whitespace and # that you want to be taken literally. Of course, you can always use \s
to match whitespace, as in new XRegExp("<a \\s+ href=…>", "x")
. Note that describing whitespace and comments as ignore-me metacharacters is not quite accurate; it might be better to think of them as do-nothing metacharacters. This distinction is important with something like \12 3
, which with the x
flag is taken as \12
followed by 3
, and not \123
. Finally, don't immediately follow whitespace or a comment with a quantifier (e.g., *
or ?
), or you will quantify the do-nothing metacharacter.
As for what exactly is whitespace, according to the Mozilla Developer Center it is equivalent to all characters matched by the following regex:
[\t\n\v\f\r \u00a0\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u200b\u2028\u2029\u3000]
- Can the s and x flags be used together?
- Yes. You can combine all supported flags (
g
, i
, m
, s
, x
) in any order.
- What regex syntax does XRegExp support?
- Whatever your browser supports natively.
- You mentioned something about improving cross-browser regex syntax consistency?
- When using
XRegExp
, a leading, unescaped ]
within a character class is considered a literal character and therefore does not end the class. This is consistent with other regex engines I've used, and is true in Internet Explorer natively. However, in Firefox the following quirks (bugs?) can be experienced:
[^]
is equivalent to [\S\s]
, although it should throw an error.
[^]]
is equivalent to [\S\s]]
, although it should be equivalent to [^\]]
or (?!])[\S\s]
.
[]
is equivalent to (?!)
(which will never match), although it should throw an error.
[]]
is equivalent to (?!)]
(which will never match), although it should be equvialent to [\]]
or ]
.
When using XRegExp
(or RegExp
with XRE.overrideNative()
), you don't have to worry about how different browsers handle this, as a leading ]
within a character class will never end the class.
- Which regex-related methods does XRegExp support?
- All of them.
- Are regexes built using XRegExp any slower than they would be otherwise?
- No.
- Does it take any longer to build regexes using XRegExp than it would otherwise?
- Yes, by a tiny amount. From personal testing, building regexes using XRegExp typically takes less than a millisecond longer than it would otherwise. This is especially trivial given that regexes should not need to be constructed within loops. Instead, a regex should be assigned to a variable before entering a loop, to avoid rebuilding it during each iteration of the loop.
- Which browsers has this been tested with?
- Firefox 2, Internet Explorer 5.5–7, and Opera 9.
- How big is the script file?
- Minified, it's less than 1KB. Gzipping reduces it further.
- What license is this released under?
- The MIT License.
- Does XRegExp affect regular expression literals?
- No. Even when using
XRE.overrideNative()
, Perl-style regex literals (e.g., /pattern/gi
) are unaffected.
- I found a bug. Why do you suck so bad?
- Are you sure the bug is in XRegExp? Regular expression syntax is somewhat complex and often changes its meaning given context. Additionally, metasequences within JavaScript string literals can change things before XRegExp ever sees your regex. In any case, whether or not you're sure you know what's causing the issue, feel free to leave a comment and I'll look into it ASAP.
Here's the script, with comments:
var XRegExp = function(pattern, flags){
if(!flags) flags = "";
if(flags.indexOf("x") !== -1){
pattern = pattern.replace(XRE.re.xMod, function($0, $1, $2){
return (/[^[]/.test($2.charAt(0)) ? $1 + "(?:)" : $0);
});
}
pattern = pattern.replace(XRE.re.badChr, function($0, $1, $2){
return $1 + $2.replace(/\r/, "\\r").replace(/\n/, "\\n");
}).
replace(XRE.re.chrClass, function($0, $1, $2){
return $1 + $2.replace(/^(\[\^?)]/, "$1\\]");
});
if(flags.indexOf("s") !== -1){
pattern = pattern.replace(XRE.re.chrClass, function($0, $1, $2){
return $1.replace(XRE.re.sMod, function($0, $1, $2){
return $1 + ($2 === "." ? "[\\S\\s]" : "");
}) + $2;
});
}
return eval("/" + pattern + "/" + flags.replace(/[sx]+/g, ""));
},
XRE = {
overrideNative: function(){
RegExp = XRegExp;
},
re: {
chrClass: /((?:[^[\\]+|\\(?:[\S\s]|$))*)((?:\[\^?]?(?:[^\\\]]+|\\(?:[\S\s]|$))*]?)?)/g,
xMod: /((?:[^[#\s\\]+|\\(?:[\S\s]|$))*)((?:\[\^?]?(?:[^\\\]]+|\\(?:[\S\s]|$))*]?|\s*#[^\n\r]*[\n\r]?\s*|\s+)?)/g,
sMod: /((?:[^\\.]+|\\(?:[\S\s]|$))*)(\.?)/g,
badChr: /((?:[^\\\r\n]+|\\(?:[^\r\n]|$(?!\s)))*)\\?([\r\n]?)/g
}
};
Download it here.
Update: This version of XRegExp is outdated. See XRegExp.com for the latest, greatest version.