parseUri 2.0: A mighty but tiny URI parser

I created parseUri v1 17 years ago, but never hosted it on GitHub/npm because it's older than both of those tools. Nevertheless, it’s been used very widely ever since due to it being tiny and predating JavaScript’s built-in URL constructor. After this short gap, I just released v2: github.com/slevithan/parseuri. It’s still tiny (nothing similar comes close, even with libraries that support far fewer URI parts, types, and edge cases), and it includes several advantages over URL:

  • parseUri gives you many additional properties (authority, userinfo, subdomain, domain, tld, resource, directory, filename, suffix) that are not available from URL.
  • URL throws e.g. if not given a protocol, and in many other cases of valid (but not supported) and invalid URIs. parseUri makes a best case effort even with partial or invalid URIs and is extremely good with edge cases.
  • URL’s rules don’t allow correctly handling many non-web protocols. For example, URL doesn’t throw on any of 'git://localhost:1234', 'ssh://myid@192.168.1.101', or 't2ab:///path/entry', but it also doesn’t get their details correct since it treats everything after : up to ? or # as part of the pathname.
  • parseUri includes a “friendly” parsing mode (in addition to its default mode) that handles human-friendly URLs like 'example.com/index.html' as expected.
  • parseUri includes partial/extensible support for second-level domains like in '//example.co.uk'.

Conversely, parseUri is single-purpose and doesn’t do normalization. But of course you can pass URIs through a normalizer separately, if you need that. Or, if you wanted to create an exceptionally lightweight URI normalizer, parseUri would be a great base to build on top of. 😊

So although it’s needed less often these days because of the built-in URL, if URL is ever not enough for your needs, this is an extremely accurate, flexible, and lightweight option.

Check it out!

Regex & Facebook Experience Tech Talk

I recently gave a talk for a group of university students studying web development in Belgrade, Serbia. In it, I talked about my background, my experience starting in tech and eventually working at Facebook for seven years, and (most relevant to this blog) I spent about 20 minutes showing off some cool and advanced regular expressions. Specifically, regexes for switching Fahrenheit to Celsius, deleting repeated words, deleting non-adjacent duplicate lines, checking password complexity, reformatting names, adding thousands separators, matching balanced parentheses, and matching palindromes.

You can find the video and slides here (Lecture at SAE Belgrade: Lessons from Facebook) or you can watch the video on YouTube (with the regex portion starting at 52:26).

New Blog ‘Lifecurious’ at slev.life

Flagrant Badassery has been pretty quiet for years now, but it's still my home for all things JavaScript and regular expression related. That said, I've just launched a shiny new blog where I'm posting about everything unrelated to programming.

Want to learn about aphantasia and hyperphantasia, the time I unmasked cult leader Karen Zerby, why South Dakota residency is an ideal choice for nomads, or discover the best English teaching books?

All of these and more await at Lifecurious.

Lifecurious screenshot

XRegExp 3.0.0!

After 3+ years, XRegExp 3.0.0 has been released. Standout features are dramatically better performance (many common operations are 2x to 50x faster) and support for full 21-bit Unicode (thanks to Mathias Bynens). I’ve also just finished updating all the documentation on xregexp.com so go check that out. 🙂

If you haven’t used XRegExp before, it’s an MIT licensed JavaScript library that provides augmented (and extensible!) regular expressions. You get new modern syntax and flags beyond what browsers support natively. XRegExp is also a regex utility belt with tools to make your client-side grepping and parsing easier, while freeing you from worrying about pesky cross-browser inconsistencies and things like manually manipulating lastIndex or slicing strings when tokenizing.

Version 3.0.0 has lots of additional features, options, fine tuning, cross-browser fixes, some new simplified syntax, and thousands of new tests. And it still supports all the browsers. Check out the long list of changes. There are a few minor breaking changes that shouldn’t affect most people and have easy workarounds. I’ve listed them all below, but see the full changelog if you need more details about them.

  • XRegExp.forEach no longer accepts or returns its context. Use binding with the provided callback instead.
  • Moved character data for Unicode category L (Letter) from Unicode Base to Unicode Categories. This has no effect if you’re already using Unicode Categories or XRegExp-All.
  • Using the same name for multiple named capturing groups in a single regex is now a SyntaxError.
  • Removed the 'all' shortcut used by XRegExp.install/uninstall.
  • Removed the Prototypes addon, which added methods apply, call, forEach, globalize, xexec, and xtest to XRegExp.prototype. These were all just aliases of methods on the XRegExp object.
  • A few changes affect custom addons only: changed the format for providing custom Unicode data, replaced XRegExp.addToken’s trigger and customFlags options with new flag and optionalFlags options, and removed the this.hasFlag function previously available within token definition functions.

You can download the new release on GitHub or install via npm. I’d love to hear feedback and common regex-related use cases that you think could be simplified via new XRegExp features. Let me know here or in GitHub issues. Thanks!

Regex Cookbook 2nd Edition Contest Winners

For the last six days, I've been running a contest on my blog to win one of 15 free copies of Regular Expressions Cookbook, 2nd Edition. Thanks to all who participated and spread the word! I've read every one of your comments, and appreciate the congratulations and the great comments about the first edition, etc. There's a lot of love for regular expressions out there!

Of the almost 400 who requested a copy, people specifically mentioned that they wanted an ebook rather than a print copy at a ratio of about 0.7 to 1, which was a lot higher than I expected. That bodes well for Regular Expressions Cookbook, though, considering that the first edition was O'Reilly Media's bestselling ebook of 2010.

Without further ado, the randomly-selected contest winners are (drumroll)…

  1. Travis Hardiman
  2. Tony Garcia
  3. Jennifer Lumer
  4. Bessie Chan
  5. Derek Brigner
  6. Marco Antonio
  7. Kent Isaly
  8. Marcus Barnes
  9. Kennie Cruz
  10. Jacob Christiansen
  11. Stafleu Buitendag
  12. Michael Mongeau
  13. Rod Vagg
  14. Henk Scholten
  15. Wim Mostmans

I've contacted all of the winners by email. If you haven't already heard from me, check your spam folder.

I've additionally chosen two people to send signed copies to: Edward Beckett and Drew Bennett. 🙂

For those who want to buy a copy, you can get it directly from O'Reilly Media, where it's on sale until September 18th, using code B2S2. O'Reilly offers DRM-free ebook copies in PDF, Mobi, and ePub formats. The book is also available with a discount and free shipping from Amazon:

Amazon also sells ebook copies for Kindle.

Congratulations to all winners, and thanks again to everyone else for participating! If you want to know more about the book, check out Rob Friesel Jr's detailed review of the second edition.