After stalling for nearly a year, I've finally released XRegExp 1.0, the next generation of my JavaScript regular expression library. Although it doesn't add support for lookbehind (as I've previously suggested) due to what would amount to significant inherent limitations, it fixes a couple bugs, corrects even more cross-browser regex inconsistencies, and adds a suite of new regular expression functions and methods that make writing regex-intensive JavaScript applications easier than ever. One of these new functions, XRegExp.addToken
, fundamentally changes XRegExp's implementation and allows you to easily create your own XRegExp plugins.
Here's XRegExp's abbreviated feature list from the brand new xregexp.com (which includes extensive documentation and code examples):
- Adds new regex and replacement text syntax, including comprehensive support for named capture.
- Adds two new regex flags:
s
, to make dot match all characters (aka singleline mode), andx
, for free-spacing and comments (aka extended mode). - Provides a suite of 12 functions and methods that make complex regex processing a breeze.
- Automagically fixes the most commonly encountered cross-browser inconsistencies in regex behavior and syntax.
- Lets you easily create and use plugins that add new syntax and flags to XRegExp's regular expression language.
The full list of changes can be seen in the changelog. Please let me know if you find any bugs or have any suggestions for the library. I'd also love to hear about projects or sites that are using XRegExp (I've got a few listed on the XRegExp homepage now).
I’m having trouble connecting to the xregexp.com site. However, I can connect to stevenlevithan.com just fine, and I got an Apache error on xregexp.com saying something about there being an error with the content coming from stevenlevithan.com. Do you have a mirror of the xregexp.com content on stevenlevithan.com, and could you post it here if you do?
Looking forward to trying this one out!
@Scott Trenda, try it again, please: xregexp.com. I’ve been having trouble with my hosting account recently, but it’s working more than 95% of the time. blog.stevenlevithan.com is on a different server, but stevenlevithan.com, xregexp.com, and regexpal.com are all sharing the same ColdFusion shared-hosting account. I’d prefer to avoid maintaining an HTML-only or PHP copy of the XRegExp site on this server.
This looks pretty sweet, your first bullet item “name capture” already has me sold. While I’m not a regexp guru I enjoy writing a good regular expression from time to time. I’ve not tried it yet but will be running some examples to get my feet wet! Thanks for your work!
Hi. I looked over the xregexp.js (1.0.1) and found a (?!) in it.
Do you know /(?!)/.test(“”) returns true peculiarly on Firefox (3.0.11 and 3.5RC3).
That’s why XRegExp(“[]”).test(“…”) leads to different results depending on the browser.
(?!|) or something I don’t know might help you. Thank you.
@ds14050, thanks for the info. I also saw the post at http://vvvvvv.sakura.ne.jp/ds14050/diary/20090703.html#p02 , which I assume is the source for your observation.
The handling noted for
(?!)
and(?=)
is a bug in Firefox. It’s not consistent, though–e.g./(?!)a/.test("a")
returnstrue
, whereas/a(?!)/.test("a")
returnsfalse
. However, this issue’s minor impact on XRegExp’s handling of empty character classes in Firefox is unlikely to affect any real-world code. The point of harmonizing the behavior of empty classes in XRegExp is to prevent the traditional, non-ES3-compliant meaning from working in IE and older versions of Safari so that bugs in your code are spotted sooner.In any case, I’ve gone ahead and fixed this issue in XRegExp 1.1.0.
At the blog post I referenced, the author’s proposed
(?!|)
(which you also mentioned) is a bad alternative, because it should work identically to(?!)
and thus it’s meaning may change in future browser versions. It also adds unnecessary backtracking. I recommend using\b\B
instead.Hello – love the library, but don’t know if it’s possible to apply the \U \E flags for string replacement? I’ve been trying to get a regexp to convert specific letters to uppercase to no avail.
@Richard, that cannot be implemented in a library–it would have to be implemented in the JavaScript language itself. That’s because
\U
and\E
in other languages are escape sequences, but in JavaScript you’d have to enter them as"\\U...\\E"
in order for thereplace
method to see the backslashes (i.e.,"\U\E"
is equivalent to"UE"
). And if you allow the user to escape the backslashes to get this functionality, well, how do you enter a literal\U
sequence? And so on…Hi Steve, first of all thanks for all your work!
I updated to firefox 3.5.2 and something that used to work it’s giving me an error now:
var hrefs = XRegExp.matchRecursive(orig,’>a\\b(?:[^<](?!/<))*<‘, ‘>/a<‘, ‘g’);
Gives me:
invalid regular expression flag g
I updated to 1.1.0 but still get that same error. Any ideas what could have happened? It used to work as expected.
Me again. Looks like it could be some sort of Firefox oddity, strangely enough It just happens on 3.5.2, no errors in 3.0.1
In xregexp-matchrecursive.js around lines 45, 46 the flags are passed by doing “g” + flags, so passing g ends up as gg which bugs Firefox.
To fix it I added:
flags = flags.replace(/g/g, “”),
Right before these lines:
left = left instanceof RegExp ? (left.global ? left : left.addFlags(“g”)) : XRegExp(left, “g” + flags),
right = right instanceof RegExp ? (right.global ? right : right.addFlags(“g”)) : XRegExp(right, “g” + flags),
See the full code here: http://bin.cakephp.org/saved/50306
Hope it saves somebody trouble. Thanks again for the code Steve.
@Eduardo Romero, thanks for the report and fix. I’ll make sure this gets fixed in the next version.
I am also running into the invalid expression flag error, but with 1.1.0 of xregexp.js.
My fix, replace line 75:
flags = real.replace.call(flags, /g(?=.*g)|i(?=.*i)|m(?=.*m)|y(?=.*y)|[^gimy]+/g, “”);
regex = RegExp(output.join(“”), flags, “”);
You should consider setting up a project on Google code or somewhere that will provide a bug tracker and maybe a wiki.
Also. You shouldn’t use the window global. Window is part of the DOM spec, and won’t be defined in JavaScript implementations other than web browsers.
Instead you could put this at the top:
var XRegExp;
if (!XRegExp) {
and replace: replacement.apply(window, arguments)
With: replacement.apply(XRegExp, arguments)
I think that dependence on the ‘window’ object is not excelent idea. So your library can be used within browsers only. But there is problem of usage out of browsers like JScript/WSH. Maybe the more common solution is to use one of the follow solutions:
1.
if (!this.XRegExp) {
2.
if (‘undefined’ == typeof XRegExp) {
I don’t know why I didn’t think to use backreferences. My fix could better be expressed as /([gmiy])(?=.*\1)|[^gmiy]/g.
Another bug fix. xregexp can cause IE6 to insert “undefined” into a replacement string.
for example:
“ab”.replace(/(a)|(b)/, ‘$2’)
yields:
“undefinedb”
I fixed this by replacing:
return ($1 ? args[$1] : “$”) + literalNumbers;
with:
return ($1 ? args[$1] || “” : “$”) + literalNumbers;
I don’t know if this is a problem in newer versions of ie.
Sweet. Thanks for the library.
Sorry to note that, but some of the fixes there –
http://xregexp.com/tests/split.html –
break something in Opera browser (I tried 7.54, 8.54, 9.21, 9.24, and 9.27) –
three lines become red in the last column (all being green in the third one).
Here’s a screenshot: http://img63.imageshack.us/img63/5654/imageputscreenshotc.jpg
@Baka, thanks for the report. I’ve verified the issue in Opera 9.01 and 9.27, which appears to be due to a bug in those old versions. However, Opera 9.27 is the last version with the problem, as Opera 9.5x, 9.6x, and 10 all fully pass the test script.
@Mike, thanks! Sorry I’ve been slow to respond, but all of the issues you mentioned have now been fixed in XRegExp 1.2.0. Also, you can post bugs here if you prefer: http://code.google.com/p/xregexp/
Steven,
I think I found a bug:
http://regexpal.com/?flags=ms®ex=^((25[0-5]|(2[0-4]|1\d|[1-9]%3F)\d)(\.(%3F%3D.)|(%3F!.))){4}%24&input=1.2.3.4%0A1.2.3.4
That website uses XRegExp and I got the ‘m’ flag up but it only counts the 2nd line as valid. The problem looks like to be the lookAhead function looking _at_ the newline character instead of pretending to be at the end of the string.
Related:
http://regexpal.com/?flags=ms®ex=^((25[0-5]|(2[0-4]|1\d|[1-9]%3F)\d)(\.(%3F%3D.)|)){4}%24&input=1.2.3.5.%0A1.2.3.4%0A
Counts both as valid, yet only the 2nd is valid.
@Aeron, no. First of all, RegexPal currently uses an old version of XRegExp (so it’s not a good way to test bugs in the latest version), and secondly, RegexPal uses native JavaScript regex syntax, so it’s not a good way to test XRegExp at all (it’s better for testing your browser’s native regex support).
With your first regex, you have the
/s
flag turned on, so it’s absolutely correct that only the second line matches (since the dot matches any character including line beaks). Turning off/s
allows the regex to match both lines, which again is correct. In IE, turning off/s
does not allow the regex to match both lines, but that is because of an IE regex bug (possibly the bug described at https://blog.stevenlevithan.com/archives/regex-lookahead-bug or something similar). You can easily test the difference cross-browser using this stripped down version of your regex:javascript:alert(/\d(?!.)/m.test("1\r"));
. In IE it returnsfalse
; in other browsers,true
.With your second regex, again, everything is as it should be.
@Steven
Thanks for the reply.
I’m using FireFox but anyway, I noticed the same problem in JAVA.
JAVA says:
“When in MULTILINE mode $ matches just before a line terminator or the end of the input sequence” (http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html#lt)
So when I’m at the end of a line from many lines and have the ‘m’ flag up. I expect, to what java says, that when I preform a lookAhead with ‘.’ from that point _and_ have the ‘s’ flag up that it will not see the line terminator in any possible way.
So for now I presume that a lookAhead will not obey the ‘m’ flag and sees the line terminators not as EOL.
Java bug?
Not specified in the RegEx world?
Another flag to introduce to overcome this?
The end of the world?
Flags: ms
^a(?=.)$ with “a\n” matches but shouldn’t.
^a(?!.)$ with “a\n” does not match but should.
Imo, with multiline enabled a lookAhead should ignore the line terminator (and the rest of the expression) if the following statement is a ‘$’.
It is not a Java bug, and it is well specified in the specs for Java, JavaScript, and most other regex flavors. In Java, of course, the singleline option is called dotall (a more descriptive and therefore superior name), and JavaScript doesn’t have a native equivalent (the
/s
flag in RegexPal comes from XRegExp, which silently converts the appropriate dots to[\s\S]
).Lookahead obeys
/m
just fine, but/m
has no impact on the dot (whereas/s
does). I think you are a bit confused about what the/m
and/s
flags actually do (this is a common point of confusion since the flags are named so terribly). Please see https://blog.stevenlevithan.com/archives/singleline-multiline-confusing for the details.XRegExp("^a(?=.)$", "ms").test("a\n")
should and does returntrue
.XRegExp("^a(?!.)$", "ms").test("a\n")
should and does returnfalse
.This is long-established, cross-library-consistent regex behavior (albeit confusing due to the flag names).
Thanks for all your work! I really appreciate it. It helped me developing websites easily.
Thanks for the great library. I think I found a bug using JQuery 1.7.1 with xregexp 1.5.0, IE6 only – around line 270:
RegExp.prototype.exec = function (str) {
var match = real.exec.apply(this, arguments),
needs to change to:
RegExp.prototype.exec = function (str) {
str = String(str);
var match = real.exec.apply(this, arguments),
This is for ECMAScript standard (section 15.10.6.2). Found the fix on http://blog.slaks.net/2011/09/xregexp-breaks-jquery-animations.html though his solution:
if (!str.slice)
str = String(str);
… caused me problems as slice was undefined under the error condition – the unconditional cast seems to work without issue.
Thanks for reporting this, Jim. The bug has been fixed in XRegExp 1.5.1.
Hey Steven,
I want to use XRegExp for validation. I am new to this and sysntax is not working.
We developed and application and now rolling application to other countries. In the textfield we should allow only charaters and space.
[^a-zA-Z ] If user enter other than this I replace with blank. We want similar for Protuguese and chinese. Brazil user should enter only protuguese characters and allow space. silimarly chinese. I tried below XRegExp
var reg = XRegExp(‘^\\p{InBasic_Latin}+$’)
XRegExp.replace(‘horas são’,reg,”);
Thansk in advance.