parseUri 1.2: Split URLs in JavaScript
I've just updated parseUri. If you haven't seen the older version, parseUri is a function which splits any well-formed URI into its parts, all of which are optional. Its combination of accuracy, flexibility, and brevity is unrivaled.
Highlights:
- Comprehensively splits URIs, including splitting the query string into key/value pairs. (Enhanced)
- Two parsing modes: loose and strict. (New)
- Easy to use (returns an object, so you can do, e.g.,
parseUri(uri).anchor). - Offers convenient, pre-concatenated components (path = directory and file; authority = userInfo, host, and port; etc.)
- Change the default names of URI parts without editing the function, by updating
parseUri.options.key. (New) - Exceptionally lightweight (1 KB before minification).
- Released under the MIT License.
Try the demo, but make sure to come back and read the details below.
Details:
Older versions of this function used what's now called loose parsing mode (which is still the default in this version). Loose mode deviates slightly from the official generic URI spec (RFC 3986), but by doing so allows the function to split URIs in a way that most end users would expect intuitively. However, the finer details of loose mode preclude it from properly handling relative paths which do not start from root (e.g., "../file.html" or "dir/file.html"). On the other hand, strict mode attempts to split URIs according to RFC 3986. Specifically, in loose mode, directories don't need to end with a slash (e.g., the "dir" in "/dir?query" is treated as a directory rather than a file name), and the URI can start with an authority without being preceded by "//" (which means that the "yahoo.com" in "yahoo.com/search/" is treated as the host, rather than part of the directory path).
Since I've assumed that most developers will consistently want to use one mode or the other, the parsing mode is not specified as an argument when running parseUri, but rather as a property of the parseUri function itself. Simply run the following line of code to switch to strict mode:
parseUri.options.strictMode = true;
From that point forward, parseUri will work in strict mode (until you turn it back off).
The code:
/* parseUri 1.2.1 (c) 2007 Steven Levithan <stevenlevithan.com> MIT License */ function parseUri (str) { var o = parseUri.options, m = o.parser[o.strictMode ? "strict" : "loose"].exec(str), uri = {}, i = 14; while (i--) uri[o.key[i]] = m[i] || ""; uri[o.q.name] = {}; uri[o.key[12]].replace(o.q.parser, function ($0, $1, $2) { if ($1) uri[o.q.name][$1] = $2; }); return uri; }; parseUri.options = { strictMode: false, key: ["source","protocol","authority","userInfo","user","password","host","port","relative","path","directory","file","query","anchor"], q: { name: "queryKey", parser: /(?:^|&)([^&=]*)=?([^&]*)/g }, parser: { strict: /^(?:([^:\/?#]+):)?(?:\/\/((?:(([^:@]*):?([^:@]*))?@)?([^:\/?#]*)(?::(\d*))?))?((((?:[^?#\/]*\/)*)([^?#]*))(?:\?([^#]*))?(?:#(.*))?)/, loose: /^(?:(?![^:@]+:[^:@\/]*@)([^:\/?#.]+):)?(?:\/\/)?((?:(([^:@]*):?([^:@]*))?@)?([^:\/?#]*)(?::(\d*))?)(((\/(?:[^?#](?![^?#\/]*\.[^?#\/.]+(?:[?#]|$)))*\/?)?([^?#\/]*))(?:\?([^#]*))?(?:#(.*))?)/ } };
You can download it or run the test suite.
parseUri has no dependencies, and has been tested in IE 5.5–7, Firefox 2.0.0.4, Opera 9.21, Safari 3.0.1 beta for Windows, and Swift 0.2.

Comment by Iván Montes on 29 June 2007:
Nice function. I miss the ability to get the query string as a list of key/value pairs, with a second argument to specify the parameter separator (with the ampersand by default). It’ll make the code a bit longer but the function would become much more complete.
Comment by Steve on 29 June 2007:
Hi Iván, thanks. parseUri already returns key/value pairs for the query string in an object called queryKey. For example, to access the value of a query key called “search” you could write parseUri(uri).queryKey.search
You can see this in action on the test page, when you click the Parse button. Does that meet your needs, or were you thinking of something different?
Comment by Iván Montes on 29 June 2007:
Sorry Steve, I didn’t notice it. That was exactly what I was talking about!
However, a minor improvement would be to accept as optional second parameter a string which holds in each char a possible argument separator, most people use the ampersand but according to the RFC any char except ‘?’ and ‘#’. See my post http://blog.netxus.es/blog/url-argument-separator
I use myself the semi-colon ‘;’ as argument separator in some projects, since there is no need to escape it when used in XML/XHTML documents.
Comment by Steve on 29 June 2007:
Iván, I have no plans to support arbitrary delimiters in the query string because that would significantly complicate the code for the benefit of probably less than 0.1% of developers, and as you noted in your blog post, server side languages like PHP, ASP.net, etc. generally don’t support delimiters other than “&” without special configuration, if at all.
However, it would be easy to manually change the code to support both “&” and “;” delimiters in your personal copy. Just change
/&?([^&=]*)=?([^&]*)/gfrom within theuri.query.replace()statement to/[&;]?([^&;=]*)=?([^&;]*)/g(or, to support only semicolons, use/;?([^;=]*)=?([^;]*)/g).In any case, this does not affect the main URI parsing (only the splitting of the query string into key/value pairs), so you could also implement a separate function to specifically work with the query string with maximum flexibility.
Comment by Iván Montes on 29 June 2007:
Steve,
I get your point and quite agree with it. What about placing the query RE in the options object so it can be easily modified by those with especial needs?
Comment by kangax on 29 June 2007:
wow, that’s one crazy regexp right there…
Comment by Steve on 30 June 2007:
@Iván Montes,
Good call. I’ve gone ahead and moved the query regex as well as the query object’s name into the options object and upped the version number from 1.1 to 1.2. Thanks for the suggestion.
Pingback by links for 2007-07-01 « [[ the sirens of titan ]] on 1 July 2007:
[…] parseUri 1.2: Split URLs in JavaScript easy url parsing (tags: code library parsing programming javascript url web webdev uri) […]
Pingback by Вот как-то не пишется.. « О PHP и о жизни… on 1 July 2007:
[…] ParseURL на яваскрипте - ParseURL на яваскрипте :) демо […]
Trackback by dev2 - webfejlesztés on 2 July 2007:
parseUri 1.2: JavaScript URL feldolgozó…
A php parse_url függvényét már ismerjük. Most ismerjük meg ugyanezt javascripthez is.
Steven Levithan: parseUri 1.2: Split URLs in JavaScript
Script:
/* parseUri 1.2; MIT License
By Steven Levithan <http://stevenlevithan.com> */
var pars…
Pingback by links for 2007-07-02 | IndianGeek on 2 July 2007:
[…] parseUri 1.2: Split URLs in JavaScript (tags: javascript parsing programming opensource) […]
Pingback by Daily misery » Blog Archive » Links for 6.29.2007 through 7.2.2007 on 2 July 2007:
[…] parseUri 1.2: Split URLs in JavaScript […]
Comment by Thomas Messier on 3 July 2007:
Hey Steven,
I couldn’t find any contact info so I’m just leaving a message here. I’m the maintainer of the CFJSON project and I’m trying to fix some things and would benefit from some regexp help and I know you’re quite good with them. If you think you could give me a hand shoot me an email (I put my email in the comment form) and I’ll tell you what I’m trying to fix. Something tells me you’ll be able to solve my problem without too much effort. Thanks in advance.
Comment by Steve on 3 July 2007:
Thomas, I just sent you an email. I’ll try to help if I can.
Pingback by 17 Links Today (2007-10-31) on 31 October 2007:
[…] parseUri 1.2: Split URLs in JavaScript awesome […]
Comment by Ariel Flesler on 9 November 2007:
Just as a possibility, if you create an anchor, assign the url as href, then you can access the anchor’s host,port,protocol,search,hash. This worked in FF dunno if it will work in the rest. I’m just saying this because it might make your code shorter :)
I hope it helped
Comment by Steve on 9 November 2007:
@Ariel Flesler:
It definitely would not make the code shorter, if you wanted to keep the same functionality as is currently provided. But still, that’s an interesting idea, if it works. ;-)
Comment by Anil Gulati on 11 March 2008:
It looks like it doesn’t support the correct / standard URL parameter delimiter which is actually the ampersand entity ‘&’ not a raw ampersand ‘&’. That’s precisely the functionality I am looking for as I’m getting an annoying error cropping up with one of my scripts that just uses a plain javascript split on ‘&’.
That said, here are some of the top 10 xhtml errors:
1. The use of a raw amperstand in a link query string. The w3c validator reports this as “cannot generate system identifier for general entity” because you’ve tried to create a new entity &xxxxxxx and not an encoded & amp ; in the string. Replace all & with & in urls.
http://elliottback.com/wp/archives/2005/08/14/ten-steps-to-valid-html/
Comment by Steve on 11 March 2008:
Um, no. This code deals with URIs, not HTML. And of course,
&is not the only HTML entity to deal with.Comment by Mark van Leeuwen on 12 March 2008:
As Safari 2.0 users may have noticed, the ‘queryString’ part isn’t filled because this version does not support a function as second parameter of String.prototype.replace() :-(
Comment by zcrux on 1 May 2008:
I get the following error using your code
o has no properties
[Break on this error] m = o.parser[o.strictMode ? “strict” : “loose”].exec(str),
Please let me know.
Thanks in advance!
Neo
Comment by Steve on 1 May 2008:
@zcrux, does the demo page work for you?