XRegExp 1.0

After stalling for nearly a year, I've finally released XRegExp 1.0, the next generation of my JavaScript regular expression library. Although it doesn't add support for lookbehind (as I've previously suggested) due to what would amount to significant inherent limitations, it fixes a couple bugs, corrects even more cross-browser regex inconsistencies, and adds a suite of new regular expression functions and methods that make writing regex-intensive JavaScript applications easier than ever. One of these new functions, XRegExp.addToken, fundamentally changes XRegExp's implementation and allows you to easily create your own XRegExp plugins.

Here's XRegExp's abbreviated feature list from the brand new xregexp.com (which includes extensive documentation and code examples):

Adds new regex and replacement text syntax, including comprehensive support for named capture.
Adds two new regex flags: s, to make dot match all characters (aka singleline mode), and x, for free-spacing and comments (aka extended mode).
Provides a suite of 12 functions and methods that make complex regex processing a breeze.
Automagically fixes the most commonly encountered cross-browser inconsistencies in regex behavior and syntax.
Lets you easily create and use plugins that add new syntax and flags to XRegExp's regular expression language.

The full list of changes can be seen in the changelog. Please let me know if you find any bugs or have any suggestions for the library. I'd also love to hear about projects or sites that are using XRegExp (I've got a few listed on the XRegExp homepage now).

29 thoughts on “XRegExp 1.0”

Scott Trenda says:

June 24, 2009 at 9:14 am

I’m having trouble connecting to the xregexp.com site. However, I can connect to stevenlevithan.com just fine, and I got an Apache error on xregexp.com saying something about there being an error with the content coming from stevenlevithan.com. Do you have a mirror of the xregexp.com content on stevenlevithan.com, and could you post it here if you do?

Looking forward to trying this one out!
Steven Levithan says:

June 24, 2009 at 12:40 pm

@Scott Trenda, try it again, please: xregexp.com. I’ve been having trouble with my hosting account recently, but it’s working more than 95% of the time. blog.stevenlevithan.com is on a different server, but stevenlevithan.com, xregexp.com, and regexpal.com are all sharing the same ColdFusion shared-hosting account. I’d prefer to avoid maintaining an HTML-only or PHP copy of the XRegExp site on this server.
Trent says:

July 2, 2009 at 2:00 pm

This looks pretty sweet, your first bullet item “name capture” already has me sold. While I’m not a regexp guru I enjoy writing a good regular expression from time to time. I’ve not tried it yet but will be running some examples to get my feet wet! Thanks for your work!
ds14050 says:

July 3, 2009 at 10:20 am

Hi. I looked over the xregexp.js (1.0.1) and found a (?!) in it.
Do you know /(?!)/.test(“”) returns true peculiarly on Firefox (3.0.11 and 3.5RC3).
That’s why XRegExp(“[]”).test(“…”) leads to different results depending on the browser.
(?!|) or something I don’t know might help you. Thank you.
Steven Levithan says:

July 3, 2009 at 11:29 am

@ds14050, thanks for the info. I also saw the post at http://vvvvvv.sakura.ne.jp/ds14050/diary/20090703.html#p02 , which I assume is the source for your observation.

The handling noted for (?!) and (?=) is a bug in Firefox. It’s not consistent, though–e.g. /(?!)a/.test("a") returns true, whereas /a(?!)/.test("a") returns false. However, this issue’s minor impact on XRegExp’s handling of empty character classes in Firefox is unlikely to affect any real-world code. The point of harmonizing the behavior of empty classes in XRegExp is to prevent the traditional, non-ES3-compliant meaning from working in IE and older versions of Safari so that bugs in your code are spotted sooner.

In any case, I’ve gone ahead and fixed this issue in XRegExp 1.1.0.

At the blog post I referenced, the author’s proposed (?!|) (which you also mentioned) is a bad alternative, because it should work identically to (?!) and thus it’s meaning may change in future browser versions. It also adds unnecessary backtracking. I recommend using \b\B instead.
Richard says:

July 28, 2009 at 4:34 pm

Hello – love the library, but don’t know if it’s possible to apply the \U \E flags for string replacement? I’ve been trying to get a regexp to convert specific letters to uppercase to no avail.
Steven Levithan says:

August 3, 2009 at 12:04 pm

@Richard, that cannot be implemented in a library–it would have to be implemented in the JavaScript language itself. That’s because \U and \E in other languages are escape sequences, but in JavaScript you’d have to enter them as "\\U...\\E" in order for the replace method to see the backslashes (i.e., "\U\E" is equivalent to "UE"). And if you allow the user to escape the backslashes to get this functionality, well, how do you enter a literal \U sequence? And so on…
Eduardo Romero says:

September 2, 2009 at 1:07 pm

Hi Steve, first of all thanks for all your work!

I updated to firefox 3.5.2 and something that used to work it’s giving me an error now:

var hrefs = XRegExp.matchRecursive(orig,’>a\\b(?:[^<](?!/<))*<‘, ‘>/a<‘, ‘g’);

Gives me:

invalid regular expression flag g

I updated to 1.1.0 but still get that same error. Any ideas what could have happened? It used to work as expected.
Eduardo Romero says:

September 2, 2009 at 5:06 pm

Me again. Looks like it could be some sort of Firefox oddity, strangely enough It just happens on 3.5.2, no errors in 3.0.1

In xregexp-matchrecursive.js around lines 45, 46 the flags are passed by doing “g” + flags, so passing g ends up as gg which bugs Firefox.

To fix it I added:

flags = flags.replace(/g/g, “”),

Right before these lines:

left = left instanceof RegExp ? (left.global ? left : left.addFlags(“g”)) : XRegExp(left, “g” + flags),
right = right instanceof RegExp ? (right.global ? right : right.addFlags(“g”)) : XRegExp(right, “g” + flags),

See the full code here: http://bin.cakephp.org/saved/50306

Hope it saves somebody trouble. Thanks again for the code Steve.
Steven Levithan says:

September 3, 2009 at 1:03 am

@Eduardo Romero, thanks for the report and fix. I’ll make sure this gets fixed in the next version.
Mike says:

September 30, 2009 at 3:01 pm

I am also running into the invalid expression flag error, but with 1.1.0 of xregexp.js.

My fix, replace line 75:

flags = real.replace.call(flags, /g(?=.*g)|i(?=.*i)|m(?=.*m)|y(?=.*y)|[^gimy]+/g, “”);
regex = RegExp(output.join(“”), flags, “”);

You should consider setting up a project on Google code or somewhere that will provide a bug tracker and maybe a wiki.
Mike says:

September 30, 2009 at 3:33 pm

Also. You shouldn’t use the window global. Window is part of the DOM spec, and won’t be defined in JavaScript implementations other than web browsers.

Instead you could put this at the top:

var XRegExp;

if (!XRegExp) {

and replace: replacement.apply(window, arguments)

With: replacement.apply(XRegExp, arguments)
Ildar says:

October 2, 2009 at 4:09 am

I think that dependence on the ‘window’ object is not excelent idea. So your library can be used within browsers only. But there is problem of usage out of browsers like JScript/WSH. Maybe the more common solution is to use one of the follow solutions:

1.
if (!this.XRegExp) {

2.
if (‘undefined’ == typeof XRegExp) {
Mike says:

October 2, 2009 at 12:02 pm

I don’t know why I didn’t think to use backreferences. My fix could better be expressed as /([gmiy])(?=.*\1)|[^gmiy]/g.
Mike says:

October 6, 2009 at 11:13 am

Another bug fix. xregexp can cause IE6 to insert “undefined” into a replacement string.

for example:
“ab”.replace(/(a)|(b)/, ‘$2’)
yields:
“undefinedb”

I fixed this by replacing:
return ($1 ? args[$1] : “$”) + literalNumbers;
with:
return ($1 ? args[$1] || “” : “$”) + literalNumbers;

I don’t know if this is a problem in newer versions of ie.
Thejesh GN says:

October 8, 2009 at 3:29 am

Sweet. Thanks for the library.
Baka says:

October 14, 2009 at 11:56 am

Sorry to note that, but some of the fixes there –
http://xregexp.com/tests/split.html –
break something in Opera browser (I tried 7.54, 8.54, 9.21, 9.24, and 9.27) –
three lines become red in the last column (all being green in the third one).
Here’s a screenshot: http://img63.imageshack.us/img63/5654/imageputscreenshotc.jpg
Steven Levithan says:

October 20, 2009 at 2:16 am

@Baka, thanks for the report. I’ve verified the issue in Opera 9.01 and 9.27, which appears to be due to a bug in those old versions. However, Opera 9.27 is the last version with the problem, as Opera 9.5x, 9.6x, and 10 all fully pass the test script.

@Mike, thanks! Sorry I’ve been slow to respond, but all of the issues you mentioned have now been fixed in XRegExp 1.2.0. Also, you can post bugs here if you prefer: http://code.google.com/p/xregexp/
Aeron says:

November 1, 2009 at 1:08 pm

Steven,

I think I found a bug:

http://regexpal.com/?flags=ms&regex=^((25[0-5]|(2[0-4]|1\d|[1-9]%3F)\d)(\.(%3F%3D.)|(%3F!.))){4}%24&input=1.2.3.4%0A1.2.3.4

That website uses XRegExp and I got the ‘m’ flag up but it only counts the 2nd line as valid. The problem looks like to be the lookAhead function looking _at_ the newline character instead of pretending to be at the end of the string.
Aeron says:

November 1, 2009 at 1:27 pm

Related:

http://regexpal.com/?flags=ms&regex=^((25[0-5]|(2[0-4]|1\d|[1-9]%3F)\d)(\.(%3F%3D.)|)){4}%24&input=1.2.3.5.%0A1.2.3.4%0A

Counts both as valid, yet only the 2nd is valid.
Steven Levithan says:

November 2, 2009 at 1:25 am

@Aeron, no. First of all, RegexPal currently uses an old version of XRegExp (so it’s not a good way to test bugs in the latest version), and secondly, RegexPal uses native JavaScript regex syntax, so it’s not a good way to test XRegExp at all (it’s better for testing your browser’s native regex support).

With your first regex, you have the /s flag turned on, so it’s absolutely correct that only the second line matches (since the dot matches any character including line beaks). Turning off /s allows the regex to match both lines, which again is correct. In IE, turning off /s does not allow the regex to match both lines, but that is because of an IE regex bug (possibly the bug described at https://blog.stevenlevithan.com/archives/regex-lookahead-bug or something similar). You can easily test the difference cross-browser using this stripped down version of your regex: javascript:alert(/\d(?!.)/m.test("1\r"));. In IE it returns false; in other browsers, true.

With your second regex, again, everything is as it should be.
Aeron says:

November 2, 2009 at 3:44 am

@Steven

Thanks for the reply.

I’m using FireFox but anyway, I noticed the same problem in JAVA.
JAVA says:
“When in MULTILINE mode $ matches just before a line terminator or the end of the input sequence” (http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html#lt)
So when I’m at the end of a line from many lines and have the ‘m’ flag up. I expect, to what java says, that when I preform a lookAhead with ‘.’ from that point _and_ have the ‘s’ flag up that it will not see the line terminator in any possible way.

So for now I presume that a lookAhead will not obey the ‘m’ flag and sees the line terminators not as EOL.

Java bug?
Not specified in the RegEx world?
Another flag to introduce to overcome this?
The end of the world?
Aeron says:

November 2, 2009 at 4:03 am

Flags: ms
^a(?=.)$ with “a\n” matches but shouldn’t.
^a(?!.)$ with “a\n” does not match but should.

Imo, with multiline enabled a lookAhead should ignore the line terminator (and the rest of the expression) if the following statement is a ‘$’.
Steven Levithan says:

November 2, 2009 at 10:08 am

It is not a Java bug, and it is well specified in the specs for Java, JavaScript, and most other regex flavors. In Java, of course, the singleline option is called dotall (a more descriptive and therefore superior name), and JavaScript doesn’t have a native equivalent (the /s flag in RegexPal comes from XRegExp, which silently converts the appropriate dots to [\s\S]).

Lookahead obeys /m just fine, but /m has no impact on the dot (whereas /s does). I think you are a bit confused about what the /m and /s flags actually do (this is a common point of confusion since the flags are named so terribly). Please see https://blog.stevenlevithan.com/archives/singleline-multiline-confusing for the details.

XRegExp("^a(?=.)$", "ms").test("a\n") should and does return true.
XRegExp("^a(?!.)$", "ms").test("a\n") should and does return false.

This is long-established, cross-library-consistent regex behavior (albeit confusing due to the flag names).
Pingback: Regular expressions « Eikonal Blog
lyrics says:

August 5, 2011 at 4:38 pm

Thanks for all your work! I really appreciate it. It helped me developing websites easily.
Jim O'Callaghan says:

December 13, 2011 at 12:01 pm

Thanks for the great library. I think I found a bug using JQuery 1.7.1 with xregexp 1.5.0, IE6 only – around line 270:

RegExp.prototype.exec = function (str) {
var match = real.exec.apply(this, arguments),

needs to change to:

RegExp.prototype.exec = function (str) {
str = String(str);
var match = real.exec.apply(this, arguments),

This is for ECMAScript standard (section 15.10.6.2). Found the fix on http://blog.slaks.net/2011/09/xregexp-breaks-jquery-animations.html though his solution:

if (!str.slice)
str = String(str);

… caused me problems as slice was undefined under the error condition – the unconditional cast seems to work without issue.
Steven Levithan says:

February 25, 2012 at 6:17 pm

Thanks for reporting this, Jim. The bug has been fixed in XRegExp 1.5.1.
vivek pothagoni says:

March 30, 2014 at 10:46 am

Hey Steven,

I want to use XRegExp for validation. I am new to this and sysntax is not working.

We developed and application and now rolling application to other countries. In the textfield we should allow only charaters and space.
[^a-zA-Z ] If user enter other than this I replace with blank. We want similar for Protuguese and chinese. Brazil user should enter only protuguese characters and allow space. silimarly chinese. I tried below XRegExp

var reg = XRegExp(‘^\\p{InBasic_Latin}+$’)
XRegExp.replace(‘horas são’,reg,”);

Thansk in advance.

29 thoughts on “XRegExp 1.0”

Leave a Reply