parseUri 2.0: A mighty but tiny URI parser

I created parseUri v1 17 years ago, but never hosted it on GitHub/npm because it's older than both of those tools. Nevertheless, it’s been used very widely ever since due to it being tiny and predating JavaScript’s built-in URL constructor. After this short gap, I just released v2: github.com/slevithan/parseuri. It’s still tiny (nothing similar comes close, even with libraries that support far fewer URI parts, types, and edge cases), and it includes several advantages over URL:

  • parseUri gives you many additional properties (authority, userinfo, subdomain, domain, tld, resource, directory, filename, suffix) that are not available from URL.
  • URL throws e.g. if not given a protocol, and in many other cases of valid (but not supported) and invalid URIs. parseUri makes a best case effort even with partial or invalid URIs and is extremely good with edge cases.
  • URL’s rules don’t allow correctly handling many non-web protocols. For example, URL doesn’t throw on any of 'git://localhost:1234', 'ssh://myid@192.168.1.101', or 't2ab:///path/entry', but it also doesn’t get their details correct since it treats everything after : up to ? or # as part of the pathname.
  • parseUri includes a “friendly” parsing mode (in addition to its default mode) that handles human-friendly URLs like 'example.com/index.html' as expected.
  • parseUri includes partial/extensible support for second-level domains like in '//example.co.uk'.

Conversely, parseUri is single-purpose and doesn’t do normalization. But of course you can pass URIs through a normalizer separately, if you need that. Or, if you wanted to create an exceptionally lightweight URI normalizer, parseUri would be a great base to build on top of. 😊

So although it’s needed less often these days because of the built-in URL, if URL is ever not enough for your needs, this is an extremely accurate, flexible, and lightweight option.

Check it out!

More URI-Related UDFs

To follow up my parseUri() function, here are several more UDFs I've written recently to help with URI management:

  • getPageUri()
    Returns a struct containing the relative and absolute URIs of the current page. The difference between getPageUri().relative and CGI.SCRIPT_NAME is that the former will include the query string, if present.
  • matchUri(testUri, [masterUri])
    Returns a Boolean indicating whether or not two URIs are the same, disregarding the following differences:
    • Fragments (page anchors), e.g., "#top".
    • Inclusion of "index.cfm" in paths, e.g., "/dir/" vs. "/dir/index.cfm" (supports trailing query strings).
    If masterUri is not provided, the current page is used for comparison (supports both relative and absolute URIs).
  • replaceUriQueryKey(uri, key, substring)
    Replaces a URI query key and its value with a supplied key=value pair. Works with relative and absolute URIs, as well as standalone query strings (with or without a leading "?"). This is also used to support the following two UDFs:
  • addUriQueryKey(uri, key, value)
    Removes any existing instances of the supplied key, then appends it together with the provided value to the provided URI.
  • removeUriQueryKey(uri, key)
    Removes one or more query keys (comma delimited) and their values from the provided URI.

View the source code.

Now that I have these at my disposal, I frequently find myself using them in combination with each other. E.g.:

<a href="<cfoutput>#addUriQueryKey(
	getPageUri().relative,
	"key",
	"value"
)#</cfoutput>">Link</a>.

Let me know if you find any of these useful.

In other news, this cracked me up.