parseUri 1.2: Split URLs in JavaScript

I've just updated parseUri. If you haven't seen the older version, parseUri is a function which splits any well-formed URI into its parts, all of which are optional. Its combination of accuracy, flexibility, and brevity is unrivaled.

Edit (2024-04-05): See `parseUri` v2!

Highlights:

Comprehensively splits URIs, including splitting the query string into key/value pairs. (Enhanced)
Two parsing modes: loose and strict. (New)
Easy to use (returns an object, so you can do, e.g., parseUri(uri).anchor).
Offers convenient, pre-concatenated components (path = directory and file; authority = userInfo, host, and port; etc.)
Change the default names of URI parts without editing the function, by updating parseUri.options.key. (New)
Exceptionally lightweight (1 KB before minification or gzipping).
Released under the MIT License.

Details:

Older versions of this function used what's now called loose parsing mode (which is still the default in this version). Loose mode deviates slightly from the official generic URI spec (RFC 3986), but by doing so allows the function to split URIs in a way that most end users would expect intuitively. However, the finer details of loose mode preclude it from properly handling relative paths which do not start from root (e.g., "../file.html" or "dir/file.html"). On the other hand, strict mode attempts to split URIs according to RFC 3986. Specifically, in loose mode, directories don't need to end with a slash (e.g., the "dir" in "/dir?query" is treated as a directory rather than a file name), and the URI can start with an authority without being preceded by "//" (which means that the "yahoo.com" in "yahoo.com/search/" is treated as the host, rather than part of the directory path).

Since I've assumed that most developers will consistently want to use one mode or the other, the parsing mode is not specified as an argument when running parseUri, but rather as a property of the parseUri function itself. Simply run the following line of code to switch to strict mode:

parseUri.options.strictMode = true;

From that point forward, parseUri will work in strict mode (until you turn it back off).

The code:

// parseUri 1.2.2
// (c) Steven Levithan <stevenlevithan.com>
// MIT License

function parseUri (str) {
	var	o   = parseUri.options,
		m   = o.parser[o.strictMode ? "strict" : "loose"].exec(str),
		uri = {},
		i   = 14;

	while (i--) uri[o.key[i]] = m[i] || "";

	uri[o.q.name] = {};
	uri[o.key[12]].replace(o.q.parser, function ($0, $1, $2) {
		if ($1) uri[o.q.name][$1] = $2;
	});

	return uri;
};

parseUri.options = {
	strictMode: false,
	key: ["source","protocol","authority","userInfo","user","password","host","port","relative","path","directory","file","query","anchor"],
	q:   {
		name:   "queryKey",
		parser: /(?:^|&)([^&=]*)=?([^&]*)/g
	},
	parser: {
		strict: /^(?:([^:\/?#]+):)?(?:\/\/((?:(([^:@]*)(?::([^:@]*))?)?@)?([^:\/?#]*)(?::(\d*))?))?((((?:[^?#\/]*\/)*)([^?#]*))(?:\?([^#]*))?(?:#(.*))?)/,
		loose:  /^(?:(?![^:@]+:[^:@\/]*@)([^:\/?#.]+):)?(?:\/\/)?((?:(([^:@]*)(?::([^:@]*))?)?@)?([^:\/?#]*)(?::(\d*))?)(((\/(?:[^?#](?![^?#\/]*\.[^?#\/.]+(?:[?#]|$)))*\/?)?([^?#\/]*))(?:\?([^#]*))?(?:#(.*))?)/
	}
};

You can download it here.

parseUri has no dependencies, and has been tested in IE 5.5–7, Firefox 2.0.0.4, Opera 9.21, Safari 3.0.1 beta for Windows, and Swift 0.2.

177 thoughts on “parseUri 1.2: Split URLs in JavaScript”

Iván Montes says:

June 29, 2007 at 7:21 am

Nice function. I miss the ability to get the query string as a list of key/value pairs, with a second argument to specify the parameter separator (with the ampersand by default). It’ll make the code a bit longer but the function would become much more complete.
Steve says:

June 29, 2007 at 7:52 am

Hi Iván, thanks. parseUri already returns key/value pairs for the query string in an object called queryKey. For example, to access the value of a query key called “search” you could write parseUri(uri).queryKey.search

You can see this in action on the test page, when you click the Parse button. Does that meet your needs, or were you thinking of something different?
Iván Montes says:

June 29, 2007 at 8:03 am

Sorry Steve, I didn’t notice it. That was exactly what I was talking about!
However, a minor improvement would be to accept as optional second parameter a string which holds in each char a possible argument separator, most people use the ampersand but according to the RFC any char except ‘?’ and ‘#’. See my post http://blog.netxus.es/blog/url-argument-separator

I use myself the semi-colon ‘;’ as argument separator in some projects, since there is no need to escape it when used in XML/XHTML documents.
Steve says:

June 29, 2007 at 8:43 am

Iván, I have no plans to support arbitrary delimiters in the query string because that would significantly complicate the code for the benefit of probably less than 0.1% of developers, and as you noted in your blog post, server side languages like PHP, ASP.net, etc. generally don’t support delimiters other than “&” without special configuration, if at all.

However, it would be easy to manually change the code to support both “&” and “;” delimiters in your personal copy. Just change /&?([^&=]*)=?([^&]*)/g from within the uri.query.replace() statement to /[&;]?([^&;=]*)=?([^&;]*)/g (or, to support only semicolons, use /;?([^;=]*)=?([^;]*)/g).

In any case, this does not affect the main URI parsing (only the splitting of the query string into key/value pairs), so you could also implement a separate function to specifically work with the query string with maximum flexibility.
Iván Montes says:

June 29, 2007 at 9:37 am

Steve,

I get your point and quite agree with it. What about placing the query RE in the options object so it can be easily modified by those with especial needs?
kangax says:

June 29, 2007 at 5:15 pm

wow, that’s one crazy regexp right there…
Steve says:

June 30, 2007 at 9:49 am

@Iván Montes,

Good call. I’ve gone ahead and moved the query regex as well as the query object’s name into the options object and upped the version number from 1.1 to 1.2. Thanks for the suggestion.
Pingback: links for 2007-07-01 « [[ the sirens of titan ]]
Pingback: links for 2007-07-02 | IndianGeek
Pingback: Daily misery » Blog Archive » Links for 6.29.2007 through 7.2.2007
Thomas Messier says:

July 3, 2007 at 2:38 pm

Hey Steven,

I couldn’t find any contact info so I’m just leaving a message here. I’m the maintainer of the CFJSON project and I’m trying to fix some things and would benefit from some regexp help and I know you’re quite good with them. If you think you could give me a hand shoot me an email (I put my email in the comment form) and I’ll tell you what I’m trying to fix. Something tells me you’ll be able to solve my problem without too much effort. Thanks in advance.
Steve says:

July 3, 2007 at 3:09 pm

Thomas, I just sent you an email. I’ll try to help if I can.
Pingback: 17 Links Today (2007-10-31)
Ariel Flesler says:

November 9, 2007 at 11:37 am

Just as a possibility, if you create an anchor, assign the url as href, then you can access the anchor’s host,port,protocol,search,hash. This worked in FF dunno if it will work in the rest. I’m just saying this because it might make your code shorter 🙂

I hope it helped
Steve says:

November 9, 2007 at 12:51 pm

@Ariel Flesler:

It definitely would not make the code shorter, if you wanted to keep the same functionality as is currently provided. But still, that’s an interesting idea, if it works. 😉
Anil Gulati says:

March 11, 2008 at 3:55 am

It looks like it doesn’t support the correct / standard URL parameter delimiter which is actually the ampersand entity ‘&’ not a raw ampersand ‘&’. That’s precisely the functionality I am looking for as I’m getting an annoying error cropping up with one of my scripts that just uses a plain javascript split on ‘&’.

That said, here are some of the top 10 xhtml errors:
1. The use of a raw amperstand in a link query string. The w3c validator reports this as “cannot generate system identifier for general entity” because you’ve tried to create a new entity &xxxxxxx and not an encoded & amp ; in the string. Replace all & with & in urls.
http://elliottback.com/wp/archives/2005/08/14/ten-steps-to-valid-html/
Steve says:

March 11, 2008 at 11:50 am

Um, no. This code deals with URIs, not HTML. And of course, & is not the only HTML entity to deal with.
Mark van Leeuwen says:

March 12, 2008 at 8:47 am

As Safari 2.0 users may have noticed, the ‘queryString’ part isn’t filled because this version does not support a function as second parameter of String.prototype.replace() 🙁
zcrux says:

May 1, 2008 at 6:08 pm

I get the following error using your code

o has no properties

[Break on this error] m = o.parser[o.strictMode ? “strict” : “loose”].exec(str),

Please let me know.

Thanks in advance!
Neo
Steve says:

May 1, 2008 at 8:18 pm

@zcrux, does the demo page work for you?
Raj says:

May 20, 2008 at 3:52 pm

Hey there,

Is the online demo using the same version of the JS that is available for download?

I have a URL that will parse in the demo just fine, but returns undefined when doing this: document.write(parseUri(urls).queryKey.q);

This is the URL btw: http://search.yahoo.com/search?p=flavor+flav&fr=yfp-t-501&toggle=1&cop=mss&ei=UTF-8
Steven Levithan says:

May 20, 2008 at 5:59 pm

@Raj, yes, the demo uses the same code, which you could have easily verified yourself (the source files are uncompressed). Your URL does not contain a q key in the query, so the line of code you posted above is working correctly.
Raj says:

May 20, 2008 at 8:09 pm

Steven,

My apologies. Initially, I felt some apprehension about browsing straight to the JS files in the demo. Just being respectful.

Then I got over it. 🙂

Raj
Hat says:

May 29, 2008 at 9:21 am

Thank you thank you thank you! In a short few weeks I’ve built at least two functions on top of this little gem and all my pages have components that will depend on them. Such a blessing!
Kyle Simpson says:

July 9, 2008 at 12:40 pm

I *love* this function, it is so incredibly powerful and helpful! Thank you so much for it.

For an open-source project I’m working on, I needed this same functionality inside of a Flash SWF. So, I’ve ported your 1.2.1 code to this regular AS3 function (not an object/class, though that would be easy to get from what I’ve done, too!).

Since escaping all that reg-ex stuff to post here in the comments would be ridiculous, I’m going to post a URL here that can be used to retrieve a text file with the code in it.

http://www.flensed.com/parseUri-AS3.txt

Steven, if want to, grab that text file and place the formatted code somewhere on this page or in this comment, that way people who come here later won’t have to go to my site to find it. 🙂

@Kyle Simpson and @Hat, thanks! Kyle, I’ll post your AS3 port here for posterity, but there’s no reason people shouldn’t get it from your site!

//  ****************************
//  Ported by Kyle Simpson from Javascript to AS3 from:
//	parseUri 1.2.1
//	(c) 2007 Steven Levithan <stevenlevithan.com>
//	MIT License
//  ****************************

public function parseUri(str:String, strictMode:Boolean=false):Object {
	var o:Object = new Object();
	o.strictMode = strictMode;
	o.key = new Array("source","protocol","authority","userInfo","user","password","host","port","relative","path","directory","file","query","anchor");
	o.q = new Object();
	o.q.name = "queryKey";
	o.q.parser = /(?:^|&)([^&=]*)=?([^&]*)/g
	o.parser = new Object();
	o.parser.strict = /^(?:([^:\/?#]+):)?(?:\/\/((?:(([^:@]*):?([^:@]*))?@)?([^:\/?#]*)(?::(\d*))?))?((((?:[^?#\/]*\/)*)([^?#]*))(?:\?([^#]*))?(?:#(.*))?)/
	o.parser.loose = /^(?:(?![^:@]+:[^:@\/]*@)([^:\/?#.]+):)?(?:\/\/)?((?:(([^:@]*):?([^:@]*))?@)?([^:\/?#]*)(?::(\d*))?)(((\/(?:[^?#](?![^?#\/]*\.[^?#\/.]+(?:[?#]|$)))*\/?)?([^?#\/]*))(?:\?([^#]*))?(?:#(.*))?)/

	var m:Object = o.parser[o.strictMode ? "strict" : "loose"].exec(str);
	var uri:Object = new Object();
	var i:int = 14;
	while (i--) uri[o.key[i]] = m[i] || "";
	uri[o.q.name] = new Object();
	uri[o.key[12]].replace(o.q.parser, function ($0, $1, $2) {
		if ($1) uri[o.q.name][$1] = $2;
	});
	return uri;
}

Antony says:

July 14, 2008 at 3:48 am

Big thanks for that amazing function! It helps me a lot with my website programming!
Andrew Zitnay says:

August 8, 2008 at 1:09 pm

Nice job Steve, very useful… I thought I’d point out a real-world example of a slight problem I saw while using your code, though. For a URL like:

http://www.zitnay.com/stuff/badurl.php?param=test@test

it detects everything before the “@” as the userthus screwing up the rest of the parse.

I realize the “@” should really be URL encoded to “%40”, but like I said, I found a real-world example of this on a live website. So, you might consider adding support for this case, at least in the loose version.
Brendan says:

August 11, 2008 at 12:09 am

Brilliant! Thanks, this code is powerful yet very friendly allowing non programmers (like myself) to implement it with ease 🙂
lucky says:

August 21, 2008 at 11:34 pm

Thanks, nice code!

I just change the query string parse function. This decode query parameters and distinct ‘key&’ from ‘key=&’: first get ‘true’ as value, last — empty string.

[code]
uri[o.key[12]].replace(o.q.parser, function ($0, $1, $2, $3) {
if ($1) uri[o.q.name][decodeURIComponent($1)] = $2 ? decodeURIComponent($3) : true;
});
…
parseUri.options = {
…
q: {
name: “queryKey”,
parser: /(?:^|[&])([^&=]*)(=?)([^&]*)/g
},
[/code]

And small shugar for strings:
[code]
// usage: “https://blog.stevenlevithan.com/archives/parseuri”.parseUri().

String.prototype.parseUri = function () { return parseUri(this.valueOf()); };
[/code]
Mark Perkins says:

August 29, 2008 at 6:11 am

Hi Steven,

Just to let you know I have put together a jQuery plugin based on your URI parser, which includes a bit of added functionality.

You can check it out here – http://projects.allmarkedup.com/jquery_url_parser/

Let me know if you have any suggestions/improvements etc! And thanks for the excellent work.
Robert says:

August 29, 2008 at 9:49 pm

Steve,

I was integrating your parser and noticed that the query parser function is “lossy”. For example, given the query string:

“a=1&b=2&c=1&c=2&c=3”

Parsing will result in this query hash: { a: 1, b: 2, c: 3 }

Here is a rewritten parser function:

if ($1) { if (! b9j.isValue(queryHash[$1])) { queryHash[$1] = $2; } else if (b9j.isArray(queryHash[$1])) { queryHash[$1].push($2); } else { queryHash[$1] = [ queryHash[$1], $2 ]; } }

You can find a full implementation here: http://appengine.bravo9.com/b9j/documentation/uri.html
Robert says:

August 29, 2008 at 10:43 pm

…oh yeah, and great parser, thanks!
Raju says:

September 5, 2008 at 4:49 pm

Awesome!!!! kudos Steven for such an elegant program! It served my purpose exactly the way i needed it!

Thanks again….
Raju
Brendan says:

September 11, 2008 at 6:02 am

Hi 🙂

I’m new to jQuery and stumbled across your script whilst doing a uni assignment. I’m a bit lost as to how to recover array query strings. ie url?=campus[]=blahland.

If I do parseUri(location).queryKey.campus%5B%5D it reports a JS error in Firebug.

Is there a way to strip the [] from the query string so it’ll just be campus?

Cheers,

Brendan
Rob says:

September 12, 2008 at 4:58 pm

I’ve posted an interactive example of my
JavaScript URI object:

http://appengine.bravo9.com/b9j/example/uri/
Steven Levithan says:

September 13, 2008 at 7:11 pm

@Brendan, what’s an array query string? Assuming the URI you’re feeding this function is actually “url?campus[]=blahland”, you could access the value via parseUri(uri).queryKey["campus[]"]. Them’s JavaScript rules.

@Robert, good point about e.g. “?c=1&c=2&c=3”. I’ll have to consider how to handle such cases in the next version of this function. Your approach (inserting an array into queryKey) seems pretty reasonable.
Pingback: Java: Matching URLs with Regex Wildcards » Leghumped
Pingback: links for 2008-11-24 « denny
Lex says:

December 7, 2008 at 12:27 am

Nice script! Could you please add support for domain/subdomain?
kvz says:

January 25, 2009 at 8:49 am

Hello Steven,

We would like to use your excellent code in our project over at
http://kevin.vanzonneveld.net/techblog/article/phpjs_licensing/

and in the near future at:
http://phpjs.org

We already noticed your code was MIT, but if you would like to be credited differently or have another comment, please drop a line okay?
LudoO says:

February 22, 2009 at 1:00 pm

Hi,

Could improve your regex to add capability to detect windows drive for local uri such file://usr:pwd@R:/dir/dir.2/index.htm?q1=0&&test1&test2=value#top ?

In actual version, drive is detected as host 🙁

I try to work on it on my side but regex is pretty complex due to non capturing group.
Idea is to add a group using simple regex such (\w:)

Any idea ?

Thanks and really pretty good job (really compact, i like it)
Gui says:

March 15, 2009 at 5:55 am

chrome?
Carl Armbruster says:

March 19, 2009 at 2:41 pm

awesome! – nuff said!
msznapka says:

June 2, 2009 at 8:02 am

Cool script, but I am missing one important function. Function which creates URI back from uri object. My scenario is: parse URI, change URI (query string), write URI back to <a href=”…
Matt Ruby says:

June 3, 2009 at 11:11 am

Great script! This is working really well for us with one exception. In Safari and Chrome the following URL will not parse:
(Edit: Long URL removed.)

I know it’s crazy… It appears to begin working when we limit the length to 450.

Thanks again for the great script!
Steven Levithan says:

June 3, 2009 at 2:41 pm

@Matt Ruby, I haven’t done any related testing, but the problem may result from the portions of the regexes that deal with user info (user name, password). Those parts can result in a lot of backtracking with long URLs that don’t contain an @ sign, since JavaScript doesn’t have features such as possessive quantifiers, atomic groups, or duplicate subpattern numbers that would help me deal with the backtracking issues.

If you don’t need support for the user and password properties returned by this script, one easy way to work around the issue is to change the following part of the regex (in both the strict and loose version):

(?:(([^:@]*):?([^:@]*))?@)?

To this:

(?:([^:@]*:[^:@]*|[^:@]*)?@)?

Then remove the “user” and “password” values from the parseUri.options.key array. If you try this out, please let me know if it solves the issue for you.
Steven Levithan says:

June 3, 2009 at 2:48 pm

@msznapka, on the demo page, if you look at the source, in the demo.js there’s a function called format that does something similar, and may be useful as a starting point for you.
Matt Ruby says:

June 3, 2009 at 3:19 pm

Works like a charm!
I also changed i = 14; to i = 12;
and uri[o.key[12]]… to uri[o.key[10]]…

Thanks for your help!

-Ruby
Steven Levithan says:

June 4, 2009 at 9:04 am

@Matt Ruby, cool, thanks for reporting back. After giving this a few more minutes of thought, here’s a way you can get rid of the backtracking problem while keeping the user and password properties around. Replace the pattern I identified earlier (in both the strict and loose regexes) with this:

(?:(([^:@]*)(?::([^:@]*))?)?@)?

Everything else should be left the same compared to the original script. When I have some time to more fully review all of parseUri, I’ll include this change in the next version (after the current v1.2.1).

Edit (2024-04-05): See parseUri v2!

Highlights:

Details:

The code:

177 thoughts on “parseUri 1.2: Split URLs in JavaScript”

Leave a Reply

Edit (2024-04-05): See `parseUri` v2!