JavaScript — Flagrant Badassery

Oniguruma-To-ES: Translate Oniguruma regexes to native JavaScript

A couple months ago I launched a new open source project, Oniguruma-To-ES. You can use it to:

Take advantage of Oniguruma's many extended regex features in JavaScript.
Run regexes written for Oniguruma from JavaScript, such as those used in TextMate grammars (used by VS Code, Shiki syntax highlighter, etc.).
Share regexes across your Ruby and JavaScript code.^✳︎

Oniguruma-To-ES deeply understands the hundreds of large and small differences between Oniguruma and JavaScript regex syntax and behavior, across multiple JavaScript version targets. It's obsessive about ensuring that the emulated features it supports have exactly the same behavior, even in extreme edge cases. And it's been battle-tested on thousands of real-world Oniguruma regexes used in TextMate grammars (via the Shiki library).

Depending on features used, Oniguruma-To-ES might use advanced emulation via a RegExp subclass (that remains a native JavaScript regular expression).

There's also a demo page. Check it out!

^{✳︎: Ruby 2.0+ uses Onigmo, a fork of Oniguruma with similar syntax and behavior.}

Regexes Got Good: The History and Future of Regular Expressions in JavaScript

Over at Smashing Magazine, I just published a 3,100-word (no fluff) article evaluating the history, present state, and future of regexes in JavaScript, with lots of tips. I think all JavaScript developers will learn something from this, and I'd love to know what was new for you!

Check it out: Regexes Got Good: The History And Future Of Regular Expressions In JavaScript.

Launching: Regex+ and Regex Colorizer v1

I've recently released two new open source regex libraries.

Regex+

Regex+ provides a regex template tag for dynamically creating readable, high performance, native JavaScript regular expressions with best practices built-in and powerful/advanced features that improve performance, readability, and maintainability. It's lightweight and supports all ES2025 regex features.

Highlights include support for free-spacing and comments, atomic groups and possessive quantifiers (that can help you avoid ReDoS), subroutines and subroutine definition groups (that enable powerful subpattern composition), and context-aware interpolation of RegExp instances, escaped strings, and partial patterns.

Run npm install regex in your JavaScript project and then simply import {regex} from 'regex' to get started. Check out examples and more on GitHub.

Regex Colorizer 1.0

Regex Colorizer is a project that highlights regex syntax, for use in blogs, docs, and regex tools. I created it for old-school RegexPal back in 2007, but it hadn't received any updates since 2010. Until now. Version 1.0 is a significant update that improves its API and adds support for modern JavaScript regex syntax and flags. Check out the demo!

parseUri 2.0: A mighty but tiny URI parser

I created parseUri v1 17 years ago, but never hosted it on GitHub/npm because it's older than both of those tools. Nevertheless, it’s been used very widely ever since due to it being tiny and predating JavaScript’s built-in URL constructor. After this short gap, I just released v2: github.com/slevithan/parseuri. It’s still tiny (nothing similar comes close, even with libraries that support far fewer URI parts, types, and edge cases), and it includes several advantages over URL:

parseUri gives you many additional properties (authority, userinfo, subdomain, domain, tld, resource, directory, filename, suffix) that are not available from URL.
URL throws e.g. if not given a protocol, and in many other cases of valid (but not supported) and invalid URIs. parseUri makes a best case effort even with partial or invalid URIs and is extremely good with edge cases.
URL’s rules don’t allow correctly handling many non-web protocols. For example, URL doesn’t throw on any of 'git://localhost:1234', 'ssh://myid@192.168.1.101', or 't2ab:///path/entry', but it also doesn’t get their details correct since it treats everything after : up to ? or # as part of the pathname.
parseUri includes a “friendly” parsing mode (in addition to its default mode) that handles human-friendly URLs like 'example.com/index.html' as expected.
parseUri includes partial/extensible support for second-level domains like in '//example.co.uk'.

Conversely, parseUri is single-purpose and doesn’t do normalization. But of course you can pass URIs through a normalizer separately, if you need that. Or, if you wanted to create an exceptionally lightweight URI normalizer, parseUri would be a great base to build on top of. 😊

So although it’s needed less often these days because of the built-in URL, if URL is ever not enough for your needs, this is an extremely accurate, flexible, and lightweight option.

Check it out!

parseUri on GitHub
The demo page lets you easily test and compare results
Changelog
Tests

XRegExp 3.0.0!

After 3+ years, XRegExp 3.0.0 has been released. Standout features are dramatically better performance (many common operations are 2x to 50x faster) and support for full 21-bit Unicode (thanks to Mathias Bynens). I’ve also just finished updating all the documentation on xregexp.com so go check that out. 🙂

If you haven’t used XRegExp before, it’s an MIT licensed JavaScript library that provides augmented (and extensible!) regular expressions. You get new modern syntax and flags beyond what browsers support natively. XRegExp is also a regex utility belt with tools to make your client-side grepping and parsing easier, while freeing you from worrying about pesky cross-browser inconsistencies and things like manually manipulating lastIndex or slicing strings when tokenizing.

Version 3.0.0 has lots of additional features, options, fine tuning, cross-browser fixes, some new simplified syntax, and thousands of new tests. And it still supports all the browsers. Check out the long list of changes. There are a few minor breaking changes that shouldn’t affect most people and have easy workarounds. I’ve listed them all below, but see the full changelog if you need more details about them.

XRegExp.forEach no longer accepts or returns its context. Use binding with the provided callback instead.
Moved character data for Unicode category L (Letter) from Unicode Base to Unicode Categories. This has no effect if you’re already using Unicode Categories or XRegExp-All.
Using the same name for multiple named capturing groups in a single regex is now a SyntaxError.
Removed the 'all' shortcut used by XRegExp.install/uninstall.
Removed the Prototypes addon, which added methods apply, call, forEach, globalize, xexec, and xtest to XRegExp.prototype. These were all just aliases of methods on the XRegExp object.
A few changes affect custom addons only: changed the format for providing custom Unicode data, replaced XRegExp.addToken’s trigger and customFlags options with new flag and optionalFlags options, and removed the this.hasFlag function previously available within token definition functions.

You can download the new release on GitHub or install via npm. I’d love to hear feedback and common regex-related use cases that you think could be simplified via new XRegExp features. Let me know here or in GitHub issues. Thanks!