parseUri 1.2: Split URLs in JavaScript

I've just updated parseUri. If you haven't seen the older version, parseUri is a function which splits any well-formed URI into its parts, all of which are optional. Its combination of accuracy, flexibility, and brevity is unrivaled.

Highlights:

  • Comprehensively splits URIs, including splitting the query string into key/value pairs. (Enhanced)
  • Two parsing modes: loose and strict. (New)
  • Easy to use (returns an object, so you can do, e.g., parseUri(uri).anchor).
  • Offers convenient, pre-concatenated components (path = directory and file; authority = userInfo, host, and port; etc.)
  • Change the default names of URI parts without editing the function, by updating parseUri.options.key. (New)
  • Exceptionally lightweight (1 KB before minification or gzipping).
  • Released under the MIT License.

Details:

Older versions of this function used what's now called loose parsing mode (which is still the default in this version). Loose mode deviates slightly from the official generic URI spec (RFC 3986), but by doing so allows the function to split URIs in a way that most end users would expect intuitively. However, the finer details of loose mode preclude it from properly handling relative paths which do not start from root (e.g., "../file.html" or "dir/file.html"). On the other hand, strict mode attempts to split URIs according to RFC 3986. Specifically, in loose mode, directories don't need to end with a slash (e.g., the "dir" in "/dir?query" is treated as a directory rather than a file name), and the URI can start with an authority without being preceded by "//" (which means that the "yahoo.com" in "yahoo.com/search/" is treated as the host, rather than part of the directory path).

Since I've assumed that most developers will consistently want to use one mode or the other, the parsing mode is not specified as an argument when running parseUri, but rather as a property of the parseUri function itself. Simply run the following line of code to switch to strict mode:

parseUri.options.strictMode = true;

From that point forward, parseUri will work in strict mode (until you turn it back off).

The code:

// parseUri 1.2.2
// (c) Steven Levithan <stevenlevithan.com>
// MIT License

function parseUri (str) {
	var	o   = parseUri.options,
		m   = o.parser[o.strictMode ? "strict" : "loose"].exec(str),
		uri = {},
		i   = 14;

	while (i--) uri[o.key[i]] = m[i] || "";

	uri[o.q.name] = {};
	uri[o.key[12]].replace(o.q.parser, function ($0, $1, $2) {
		if ($1) uri[o.q.name][$1] = $2;
	});

	return uri;
};

parseUri.options = {
	strictMode: false,
	key: ["source","protocol","authority","userInfo","user","password","host","port","relative","path","directory","file","query","anchor"],
	q:   {
		name:   "queryKey",
		parser: /(?:^|&)([^&=]*)=?([^&]*)/g
	},
	parser: {
		strict: /^(?:([^:\/?#]+):)?(?:\/\/((?:(([^:@]*)(?::([^:@]*))?)?@)?([^:\/?#]*)(?::(\d*))?))?((((?:[^?#\/]*\/)*)([^?#]*))(?:\?([^#]*))?(?:#(.*))?)/,
		loose:  /^(?:(?![^:@]+:[^:@\/]*@)([^:\/?#.]+):)?(?:\/\/)?((?:(([^:@]*)(?::([^:@]*))?)?@)?([^:\/?#]*)(?::(\d*))?)(((\/(?:[^?#](?![^?#\/]*\.[^?#\/.]+(?:[?#]|$)))*\/?)?([^?#\/]*))(?:\?([^#]*))?(?:#(.*))?)/
	}
};

You can download it here.

parseUri has no dependencies, and has been tested in IE 5.5–7, Firefox 2.0.0.4, Opera 9.21, Safari 3.0.1 beta for Windows, and Swift 0.2.

178 thoughts on “parseUri 1.2: Split URLs in JavaScript”

  1. @Kyle Simpson and @Hat, thanks! Kyle, I’ll post your AS3 port here for posterity, but there’s no reason people shouldn’t get it from your site!

    //  ****************************
    //  Ported by Kyle Simpson from Javascript to AS3 from:
    //	parseUri 1.2.1
    //	(c) 2007 Steven Levithan <stevenlevithan.com>
    //	MIT License
    //  ****************************
    
    public function parseUri(str:String, strictMode:Boolean=false):Object {
    	var o:Object = new Object();
    	o.strictMode = strictMode;
    	o.key = new Array("source","protocol","authority","userInfo","user","password","host","port","relative","path","directory","file","query","anchor");
    	o.q = new Object();
    	o.q.name = "queryKey";
    	o.q.parser = /(?:^|&)([^&=]*)=?([^&]*)/g
    	o.parser = new Object();
    	o.parser.strict = /^(?:([^:\/?#]+):)?(?:\/\/((?:(([^:@]*):?([^:@]*))?@)?([^:\/?#]*)(?::(\d*))?))?((((?:[^?#\/]*\/)*)([^?#]*))(?:\?([^#]*))?(?:#(.*))?)/
    	o.parser.loose = /^(?:(?![^:@]+:[^:@\/]*@)([^:\/?#.]+):)?(?:\/\/)?((?:(([^:@]*):?([^:@]*))?@)?([^:\/?#]*)(?::(\d*))?)(((\/(?:[^?#](?![^?#\/]*\.[^?#\/.]+(?:[?#]|$)))*\/?)?([^?#\/]*))(?:\?([^#]*))?(?:#(.*))?)/
    
    	var m:Object = o.parser[o.strictMode ? "strict" : "loose"].exec(str);
    	var uri:Object = new Object();
    	var i:int = 14;
    	while (i--) uri[o.key[i]] = m[i] || "";
    	uri[o.q.name] = new Object();
    	uri[o.key[12]].replace(o.q.parser, function ($0, $1, $2) {
    		if ($1) uri[o.q.name][$1] = $2;
    	});
    	return uri;
    }
  2. Big thanks for that amazing function! It helps me a lot with my website programming!

  3. Nice job Steve, very useful… I thought I’d point out a real-world example of a slight problem I saw while using your code, though. For a URL like:

    http://www.zitnay.com/stuff/badurl.php?param=test@test

    it detects everything before the “@” as the userthus screwing up the rest of the parse.

    I realize the “@” should really be URL encoded to “%40”, but like I said, I found a real-world example of this on a live website. So, you might consider adding support for this case, at least in the loose version.

  4. Brilliant! Thanks, this code is powerful yet very friendly allowing non programmers (like myself) to implement it with ease 🙂

  5. Thanks, nice code!

    I just change the query string parse function. This decode query parameters and distinct ‘key&’ from ‘key=&’: first get ‘true’ as value, last — empty string.

    [code]
    uri[o.key[12]].replace(o.q.parser, function ($0, $1, $2, $3) {
    if ($1) uri[o.q.name][decodeURIComponent($1)] = $2 ? decodeURIComponent($3) : true;
    });

    parseUri.options = {

    q: {
    name: “queryKey”,
    parser: /(?:^|[&])([^&=]*)(=?)([^&]*)/g
    },
    [/code]

    And small shugar for strings:
    [code]
    // usage: “https://blog.stevenlevithan.com/archives/parseuri”.parseUri().

    String.prototype.parseUri = function () { return parseUri(this.valueOf()); };
    [/code]

  6. Steve,

    I was integrating your parser and noticed that the query parser function is “lossy”. For example, given the query string:

    “a=1&b=2&c=1&c=2&c=3”

    Parsing will result in this query hash: { a: 1, b: 2, c: 3 }

    Here is a rewritten parser function:


    if ($1) {
    if (! b9j.isValue(queryHash[$1])) {
    queryHash[$1] = $2;
    }
    else if (b9j.isArray(queryHash[$1])) {
    queryHash[$1].push($2);
    }
    else {
    queryHash[$1] = [ queryHash[$1], $2 ];
    }
    }

    You can find a full implementation here: http://appengine.bravo9.com/b9j/documentation/uri.html

  7. Awesome!!!! kudos Steven for such an elegant program! It served my purpose exactly the way i needed it!

    Thanks again….
    Raju

  8. Hi 🙂

    I’m new to jQuery and stumbled across your script whilst doing a uni assignment. I’m a bit lost as to how to recover array query strings. ie url?=campus[]=blahland.

    If I do parseUri(location).queryKey.campus%5B%5D it reports a JS error in Firebug.

    Is there a way to strip the [] from the query string so it’ll just be campus?

    Cheers,

    Brendan

  9. @Brendan, what’s an array query string? Assuming the URI you’re feeding this function is actually “url?campus[]=blahland”, you could access the value via parseUri(uri).queryKey["campus[]"]. Them’s JavaScript rules.

    @Robert, good point about e.g. “?c=1&c=2&c=3”. I’ll have to consider how to handle such cases in the next version of this function. Your approach (inserting an array into queryKey) seems pretty reasonable.

  10. Hi,

    Could improve your regex to add capability to detect windows drive for local uri such file://usr:pwd@R:/dir/dir.2/index.htm?q1=0&&test1&test2=value#top ?

    In actual version, drive is detected as host 🙁

    I try to work on it on my side but regex is pretty complex due to non capturing group.
    Idea is to add a group using simple regex such (\w:)

    Any idea ?

    Thanks and really pretty good job (really compact, i like it)

  11. Cool script, but I am missing one important function. Function which creates URI back from uri object. My scenario is: parse URI, change URI (query string), write URI back to <a href=”…

  12. Great script! This is working really well for us with one exception. In Safari and Chrome the following URL will not parse:
    (Edit: Long URL removed.)

    I know it’s crazy… It appears to begin working when we limit the length to 450.

    Thanks again for the great script!

  13. @Matt Ruby, I haven’t done any related testing, but the problem may result from the portions of the regexes that deal with user info (user name, password). Those parts can result in a lot of backtracking with long URLs that don’t contain an @ sign, since JavaScript doesn’t have features such as possessive quantifiers, atomic groups, or duplicate subpattern numbers that would help me deal with the backtracking issues.

    If you don’t need support for the user and password properties returned by this script, one easy way to work around the issue is to change the following part of the regex (in both the strict and loose version):

    (?:(([^:@]*):?([^:@]*))?@)?

    To this:

    (?:([^:@]*:[^:@]*|[^:@]*)?@)?

    Then remove the “user” and “password” values from the parseUri.options.key array. If you try this out, please let me know if it solves the issue for you.

  14. Works like a charm!
    I also changed i = 14; to i = 12;
    and uri[o.key[12]]… to uri[o.key[10]]…

    Thanks for your help!

    -Ruby

  15. @Matt Ruby, cool, thanks for reporting back. After giving this a few more minutes of thought, here’s a way you can get rid of the backtracking problem while keeping the user and password properties around. Replace the pattern I identified earlier (in both the strict and loose regexes) with this:

    (?:(([^:@]*)(?::([^:@]*))?)?@)?

    Everything else should be left the same compared to the original script. When I have some time to more fully review all of parseUri, I’ll include this change in the next version (after the current v1.2.1).

Leave a Reply

Your email address will not be published. Required fields are marked *