XRegExp 3.0.0!

After 3+ years, XRegExp 3.0.0 has been released. Standout features are dramatically better performance (many common operations are 2x to 50x faster) and support for full 21-bit Unicode (thanks to Mathias Bynens). I’ve also just finished updating all the documentation on xregexp.com so go check that out. 🙂

If you haven’t used XRegExp before, it’s an MIT licensed JavaScript library that provides augmented (and extensible!) regular expressions. You get new modern syntax and flags beyond what browsers support natively. XRegExp is also a regex utility belt with tools to make your client-side grepping and parsing easier, while freeing you from worrying about pesky cross-browser inconsistencies and things like manually manipulating lastIndex or slicing strings when tokenizing.

Version 3.0.0 has lots of additional features, options, fine tuning, cross-browser fixes, some new simplified syntax, and thousands of new tests. And it still supports all the browsers. Check out the long list of changes. There are a few minor breaking changes that shouldn’t affect most people and have easy workarounds. I’ve listed them all below, but see the full changelog if you need more details about them.

  • XRegExp.forEach no longer accepts or returns its context. Use binding with the provided callback instead.
  • Moved character data for Unicode category L (Letter) from Unicode Base to Unicode Categories. This has no effect if you’re already using Unicode Categories or XRegExp-All.
  • Using the same name for multiple named capturing groups in a single regex is now a SyntaxError.
  • Removed the 'all' shortcut used by XRegExp.install/uninstall.
  • Removed the Prototypes addon, which added methods apply, call, forEach, globalize, xexec, and xtest to XRegExp.prototype. These were all just aliases of methods on the XRegExp object.
  • A few changes affect custom addons only: changed the format for providing custom Unicode data, replaced XRegExp.addToken’s trigger and customFlags options with new flag and optionalFlags options, and removed the this.hasFlag function previously available within token definition functions.

You can download the new release on GitHub or install via npm. I’d love to hear feedback and common regex-related use cases that you think could be simplified via new XRegExp features. Let me know here or in GitHub issues. Thanks!

Win a Free Copy of Regex Cookbook 2nd Edition

Update: This contest is now finished. See the list of winners.

I'm excited to announce the release of Regular Expressions Cookbook 2nd Edition, which I wrote together with regex superguru Jan Goyvaerts. It has actually been available as an ebook for a couple weeks on oreilly.com, but as of now, it is also in stock on amazon.com.

To promote this release, O'Reilly Media is giving away free copies to 15 people who comment on this post on or before September 7th! To get a free copy, you must read the details at the end of this post. But first, some FAQs about the second edition:

Wait…this is a cookbook?

The book tackles hundreds of real-world regular expression tasks in a problem, solution, discussion format. There's also a detailed regular expression tutorial included, in the same format. Check out Jeff Atwood's and Ben Nadel's reviews of the first edition, which have more details about this.

Update: Rob Friesel Jr. posted an awesome, detailed review of the second edition.

The first edition was a bestseller, and is now available in eight languages. It briefly held Amazon's #1 spot for computer books upon its release in mid 2009, and the ebook version was O'Reilly's top seller of 2010.

What has changed with this new edition?

The second edition adds more content, and updates existing chapters. There are innumerable improvements, including the most noticeable addition of a new chapter written by Jan, titled Source Code and Log Files, and various new recipes interspersed with four of the other chapters.

There are 101 new pages in the second edition, and that's after shortening and removing some content from the first edition. There were 125 recipes in the first edition, upped to 146 in the second. Note that many of the book's recipes provide solutions and in-depth discussions for more than one problem. Tons of changes, ranging from minor copyedits and errata corrections to major revisions and the addition of significant new content, were made throughout the existing content. Everything was brought up to date with the latest standards, tools, and programming language versions. In particular, updates to Java and Perl since the first edition brought very significant regular expression changes. Plus, we've covered some advanced regular expression features that already existed the last time around, but didn't make it into the first edition.

The first edition was already groundbreaking for the depth of its explanations and its equal coverage of all regexes in eight programming languages (C#, Java, JavaScript, Perl, PHP, Python, Ruby, and VB.NET). The second edition significantly improves upon this by providing even more details about the many peculiarities and differences of the APIs, syntax, and behavior of these regex flavors and programming languages. The second edition also adds coverage of XRegExp when it provides a better solution than native JavaScript. IMO, the second edition is easily the most comprehensive source of information about the use of modern regexes across multiple programming languages—far more detailed than anything else in print or online.

What will interest long-term fans the most?

Even though there are lots of important changes throughout the book, the new recipes and the updated coverage for the latest programming languages are probably the main reasons for owners of the first edition to upgrade. The fully-new recipes cover creating a regex-based parser, validating password complexity, adding thousands separators to numbers, matching various kinds of numbers, decoding XML entities, and everything in the new Source Code and Log Files chapter. The coverage of XRegExp is also completely new in the second edition.

What will cause readers to trip over themselves in their haste to buy a copy?

If you read though the book, you'll learn about a lot of things—much more than just regular expressions. You'll almost certainly learn something new about Unicode, phone numbers, and XML, as just a few examples. You'll learn that the eighth floor of the Saks Fifth Avenue store in New York City has its own ZIP code, which also happens to be the only ZIP code that includes letters. You'll learn that the Chicago Manual of Style and Merriam-Webster's Biographical Dictionary disagree on the correct alphabetical listing of the name Charles de Gaulle (my girlfriend and I are in opposing camps). Jan and I put a ton of research into the book, and we pay attention to details. I think that shines through.

Oh yeah, and along the way, you'll also become a Master Chef of regular expressions, able to slice and dice text with the best of them. But not everyone will want to actually read through the book. Some readers will prefer to take advantage of the cookbook format and read only the parts that solve their immediate problems. That's fine, too.

Many developers complain that they're continually relearning regular expressions, going back to the reference documents every time they need to write a new regex. The problem/solution approach of Regular Expressions Cookbook means you learn by doing, and we think that helps the details stick with you more securely than with the other books and websites out there.

Many regex novices turn to Google to get prewritten regexes that solve their problems. Unfortunately, if you're not already fluent in regex, you won't realize that 90% of the regexes out there have some kind of problem, be it returning false positives or negatives, performing inefficiently (or maybe even crashing your server when fed malicious data), being more complicated than necessary, not being portable, or what have you. When you use the regexes in Regular Expressions Cookbook, not only do you get detailed coverage of all the related issues (which helps you customize the solution, if necessary), you also get the peace of mind that you're using proven solutions by real subject-matter experts.

So how do I enter to win a free copy?

Simply comment on this blog post on or before 11:59 PM EDT on September 7th, and you'll be in the running. I wish I could leave the contest open a bit longer, but I'll be moving to California to work for a little Internet startup. You'll need to use your actual first and last name and email address with your comment. Names are published, but email addresses are not. Each person commenting has only one chance to win, regardless of how many comments they post. If you don't know what to write in your comment, just mention whether you'd prefer a print or ebook copy.

Shortly after this contest ends, I'll randomly choose 15 winners and contact them by email. If you prefer a printed copy, I'll be asking for your address and, if you're outside of the U.S., your phone number. O'Reilly will pay for shipping to anywhere in the world. Good luck!

Coauthor Jan Goyvaerts has written up his own summary of the changes in What's New in The Second Edition of Regular Expressions Cookbook.


Follow me on Twitter @slevithan or on GitHub at slevithan.

XRegExp Updates

A few days ago, I posted a long-overdue XRegExp bug fix release (version 1.5.1). This was mainly to address an IE issue that a number of people have written to me and blogged about. Specifically, RegExp.prototype.exec no longer throws an error in IE when it is simultaneously provided a nonstring argument and called on a regex with a capturing group that matches an empty string. That's an edge case of an edge case, but it was causing XRegExp to conflict with jQuery 1.7.1 (oops). You can see the full list of changes in the changelog.

But wait, there's more… XRegExp's Unicode plugin has been updated to support Unicode 6.1 (released January 2012), rather than Unicode 5.2. I've also added a new test suite with 265 tests so far, and more on the way.

More substantial changes to XRegExp are planned and coming soon. Follow the brand new XRegExp repository on GitHub to keep up to date or to fork it and help shape the future of this one-of-a-kind JavaScript library. 🙂

Regex Syntax Highlighter

Do you regularly post regular expressions online? Have you seen the regex syntax highlighting in RegexPal, RegexBuddy, or on my blog (example), and wanted to apply it to your own websites? Prompted by blog reader Mark McDonnell, I've extracted the regex syntax highlighting engine built into RegexPal and made it into its own library, unimaginatively named JavaScript Regex Syntax Highlighter. When combined with the provided CSS, this 1.6 KB self-contained JavaScript file can be used, for instance, to automatically apply regex syntax highlighting to any HTML element with the "regex" class. You can see an example of doing just that on my quick and dirty test page.

Highlighting example: <table\b[^>]*>(?:(?=([^<]+))\1|<(?!table\b[^>]*>))*?</table>

Although the library is simple (there's just one function to call), the syntax highlighting is pretty advanced and handles all valid JavaScript regex syntax and errors (with errors highlighted in red). An example of its advanced highlighting support is that it knows, based on the context, whether \10 is backreference 10, backreference 1 followed by a literal zero, octal character index 10, or something else altogether due to its position in the surrounding pattern. Speaking of octal escapes (which are de facto browser extensions; not part of the spec.), they are correctly highlighted according to their subtle differences inside and outside character classes (outside of character classes only, octals can include a fourth digit if the leading digit is a zero). As far as I'm aware, this is the first JavaScript library for highlighting regex syntax, with or without the level of completeness included here. For people who might feel inclined to use or improve upon my work, I've made the licensing as permissive as possible to avoid getting in your way. RegexPal is already open source under the GNU LGPL 3.0 License, but this new library is released under the MIT License. If you plan to customize or help upgrade this code, note that it could probably use a bit of an overhaul (it's ripped from RegexPal with minimal modification), and might require an overhaul if you want to cleanly add support for additional regex flavors. Another nifty feature I plan to eventually add is explanatory title attributes for each element in the returned HTML, which might be particularly helpful for deciphering any highlighted errors or warnings. Let me know if this library is useful for you, or if there are any other features you'd like to see added or changed. Thanks! Link: JavaScript Regex Syntax Highlighter.

Five Free Copies of Upcoming O’Reilly Book ‘High Performance JavaScript’

Update (2010-02-25): This contest is now closed.

Book cover: High Performance JavaScript

Last year, Yahoo! engineer and all-around JavaScript badass Nicholas Zakas asked if I was interested in writing a chapter for a new book on JavaScript performance that he was working on. I agreed, and that book, High Performance JavaScript, is now available for preorder at Amazon and other fine book retailers.

In addition to the wide-ranging content by Nicholas and a chapter on string and regular expression performance by yours truly, chapters were also contributed by an awesome lineup of JavaScript performance gurus: Ross Harmes, Julien Lecomte, Stoyan Stefanov, and Matt Sweeney. This book is unique in its laser-focus on optimizing the performance of your JavaScript applications, and covers many advanced topics in the process. The chapter on strings and regular expressions provides what I think is easily the most in-depth coverage of cross-browser JavaScript regex performance currently available.

Here's the list of chapters:

  1. Loading and Execution
  2. Data Access
  3. DOM Scripting (Stoyan Stefanov)
  4. Algorithms and Flow Control
  5. Strings and Regular Expressions (Steven Levithan)
  6. Responsive Interfaces
  7. Ajax (Ross Harmes)
  8. Programming Practices
  9. Build and Deployment (Julien Lecomte)
  10. Tools (Matt Sweeney)

To celebrate the completion of this book, I'm giving away three copies. O'Reilly Media increased the offer to five books! All you need to do is comment on this post by February 24th, and I'll pick five people to send a copy to as soon as it's released (Amazon says March 15th). If you prefer, I'd be happy to send you a copy of Regular Expressions Cookbook instead (please note which book you want in your comment). Four winners will be chosen at random from the pool of unique commenters (I'll be tracking IPs), and the fifth based on the reason given for why you want a copy.

Make sure to include your email address in the comment form, since I'll need it to contact you if you're selected (your email address won't be used for any other purpose). Good luck, and congratulations to Nicholas Zakas and all the other authors on completing a fantastic new book!

Edit (2010-02-05): My blog has been offline more often than not for the first two days after posting this, and many people have reported that they were unable to post a comment. I apologize for the screw-up—my blog is now on a different server, and the problems should be resolved. Please try again!

Edit (2010-02-08): O'Reilly Media kindly offered to pick up the tab for this giveaway, and increased the winnings to five books!

Edit (2010-02-09): Nicholas Zakas posted more information about High Performance JavaScript on his blog: Announcing High Performance JavaScript.

Edit (2010-02-25): This contest is now closed. Winners will be announced here shortly.

Edit (2010-03-03): Following are the winners of this giveaway (the first four were chosen randomly):

  1. David Henderson
  2. Daniel Trebbien
  3. Lea Verou
  4. Stefan "schnalle" Schallerl
  5. Adam Crabtree

No. 5 Adam Crabtree, who wants to review the book and share it with members of the DallasJS Meetup Group, wins the nonrandom drawing for the best reason to win a copy. Runners up for this selection were Yoav, who promised to donate the book to a high school library after he's done with it; Nick Carter, who threatened me with his wrath if he doesn't win (I'll have to endure); Paul Irish, who kindly offered to have my last name corrected (to that of a sea monster) in exchange for winning; Alexei, a technical editor of a couple of Nicholas Zakas's previous books who'd like to know how many errors this one contains; and Marcel Korpel, who wants to improve his users' health by reducing the "headaches, general stress and insomnia" they suffer while waiting on his websites. 🙂

The winners have been informed by email about how to collect their prize. Thanks to everyone for playing!