parseUri 1.2: Split URLs in JavaScript

I've just updated parseUri. If you haven't seen the older version, parseUri is a function which splits any well-formed URI into its parts, all of which are optional. Its combination of accuracy, flexibility, and brevity is unrivaled.

Highlights:

  • Comprehensively splits URIs, including splitting the query string into key/value pairs. (Enhanced)
  • Two parsing modes: loose and strict. (New)
  • Easy to use (returns an object, so you can do, e.g., parseUri(uri).anchor).
  • Offers convenient, pre-concatenated components (path = directory and file; authority = userInfo, host, and port; etc.)
  • Change the default names of URI parts without editing the function, by updating parseUri.options.key. (New)
  • Exceptionally lightweight (1 KB before minification or gzipping).
  • Released under the MIT License.

Details:

Older versions of this function used what's now called loose parsing mode (which is still the default in this version). Loose mode deviates slightly from the official generic URI spec (RFC 3986), but by doing so allows the function to split URIs in a way that most end users would expect intuitively. However, the finer details of loose mode preclude it from properly handling relative paths which do not start from root (e.g., "../file.html" or "dir/file.html"). On the other hand, strict mode attempts to split URIs according to RFC 3986. Specifically, in loose mode, directories don't need to end with a slash (e.g., the "dir" in "/dir?query" is treated as a directory rather than a file name), and the URI can start with an authority without being preceded by "//" (which means that the "yahoo.com" in "yahoo.com/search/" is treated as the host, rather than part of the directory path).

Since I've assumed that most developers will consistently want to use one mode or the other, the parsing mode is not specified as an argument when running parseUri, but rather as a property of the parseUri function itself. Simply run the following line of code to switch to strict mode:

parseUri.options.strictMode = true;

From that point forward, parseUri will work in strict mode (until you turn it back off).

The code:

// parseUri 1.2.2
// (c) Steven Levithan <stevenlevithan.com>
// MIT License

function parseUri (str) {
	var	o   = parseUri.options,
		m   = o.parser[o.strictMode ? "strict" : "loose"].exec(str),
		uri = {},
		i   = 14;

	while (i--) uri[o.key[i]] = m[i] || "";

	uri[o.q.name] = {};
	uri[o.key[12]].replace(o.q.parser, function ($0, $1, $2) {
		if ($1) uri[o.q.name][$1] = $2;
	});

	return uri;
};

parseUri.options = {
	strictMode: false,
	key: ["source","protocol","authority","userInfo","user","password","host","port","relative","path","directory","file","query","anchor"],
	q:   {
		name:   "queryKey",
		parser: /(?:^|&)([^&=]*)=?([^&]*)/g
	},
	parser: {
		strict: /^(?:([^:\/?#]+):)?(?:\/\/((?:(([^:@]*)(?::([^:@]*))?)?@)?([^:\/?#]*)(?::(\d*))?))?((((?:[^?#\/]*\/)*)([^?#]*))(?:\?([^#]*))?(?:#(.*))?)/,
		loose:  /^(?:(?![^:@]+:[^:@\/]*@)([^:\/?#.]+):)?(?:\/\/)?((?:(([^:@]*)(?::([^:@]*))?)?@)?([^:\/?#]*)(?::(\d*))?)(((\/(?:[^?#](?![^?#\/]*\.[^?#\/.]+(?:[?#]|$)))*\/?)?([^?#\/]*))(?:\?([^#]*))?(?:#(.*))?)/
	}
};

You can download it here.

parseUri has no dependencies, and has been tested in IE 5.5–7, Firefox 2.0.0.4, Opera 9.21, Safari 3.0.1 beta for Windows, and Swift 0.2.

parseUri: Split URLs in JavaScript

Update: The following post is outdated. See parseUri 1.2 for the latest, greatest version.

For fun, I spent the 10 minutes needed to convert my parseUri() ColdFusion UDF into a JavaScript function.

For those who haven't already seen it, I'll repeat my explanation from the other post…

parseUri() splits any well-formed URI into its parts (all are optional). Note that all parts are split with a single regex using backreferences, and all groupings which don't contain complete URI parts are non-capturing. My favorite bit of this function is its robust support for splitting the directory path and filename (it supports directories with periods, and without a trailing backslash), which I haven't seen matched in other URI parsers. Since the function returns an object, you can do, e.g., parseUri(uri).anchor, etc.

I should note that, by design, this function does not attempt to validate the URI it receives, as that would limit its flexibility. IMO, validation is an entirely unrelated process that should come before or after splitting a URI into its parts.

This function has no dependencies, and should work cross-browser. It has been tested in IE 5.5–7, Firefox 2, and Opera 9.

/* parseUri JS v0.1.1, by Steven Levithan <http://stevenlevithan.com>
Splits any well-formed URI into the following parts (all are optional):
----------------------
- source (since the exec method returns the entire match as key 0, we might as well use it)
- protocol (i.e., scheme)
- authority (includes both the domain and port)
  - domain (i.e., host; can be an IP address)
  - port
- path (includes both the directory path and filename)
  - directoryPath (supports directories with periods, and without a trailing backslash)
  - fileName
- query (does not include the leading question mark)
- anchor (i.e., fragment) */
function parseUri(sourceUri){
	var uriPartNames = ["source","protocol","authority","domain","port","path","directoryPath","fileName","query","anchor"],
		uriParts = new RegExp("^(?:([^:/?#.]+):)?(?://)?(([^:/?#]*)(?::(\\d*))?)((/(?:[^?#](?![^?#/]*\\.[^?#/.]+(?:[\\?#]|$)))*/?)?([^?#/]*))?(?:\\?([^#]*))?(?:#(.*))?").exec(sourceUri),
		uri = {};
	
	for(var i = 0; i < 10; i++){
		uri[uriPartNames[i]] = (uriParts[i] ? uriParts[i] : "");
	}
	
	/* Always end directoryPath with a trailing backslash if a path was present in the source URI
	Note that a trailing backslash is NOT automatically inserted within or appended to the "path" key */
	if(uri.directoryPath.length > 0){
		uri.directoryPath = uri.directoryPath.replace(/\/?$/, "/");
	}
	
	return uri;
}

Test it.

Is there any leaner, meaner URI parser out there? 🙂


Edit: This function doesn't currently support URIs which include a username or username/password pair (e.g., "http://user:password@domain.com/"). I didn't care about this when I originally wrote the ColdFusion UDF this is based on, since I never use such URIs. However, since I've released this I kind of feel like the support should be there. Supporting such URIs and appropriately splitting the parts would be easy. What would take longer is setting up an appropriate, large list of all kinds of URIs (both well-formed and not) to retest the function against. However, if people leave comments asking for the support, I'll go ahead and add it.

parseUri: Split URLs in ColdFusion

Update: I've added a JavaScript implementation of the following UDF. See parseUri: Split URLs in JavaScript.

Here's a UDF I wrote recently which allows me to show off my regex skillz. parseUri() splits any well-formed URI into its components (all are optional).

The core code is already very brief, but I could replace everything within the <cfloop> with one line of code if I didn't have to account for bugs in the reFind() function (tested in CF7). Note that all components are split with a single regex, using backreferences. My favorite part of this UDF is its robust support for splitting the directory path and filename (it supports directories with periods, and without a trailing backslash), which I haven't seen matched in other URI parsers.

Since the function returns a struct, you can do, e.g., parseUri(uri).anchor, etc. Check it out:

See the demo and get the source code.