parseUri 1.2: Split URLs in JavaScript

Edit (2024): parseUri has had a major update and is now available on GitHub and npm

I've just updated parseUri. If you haven't seen the older version, parseUri is a function which splits any well-formed URI into its parts, all of which are optional. Its combination of accuracy, flexibility, and brevity is unrivaled.

Highlights:

Comprehensively splits URIs, including splitting the query string into key/value pairs. (Enhanced)
Two parsing modes: loose and strict. (New)
Easy to use (returns an object, so you can do, e.g., parseUri(uri).anchor).
Offers convenient, pre-concatenated components (path = directory and file; authority = userInfo, host, and port; etc.)
Change the default names of URI parts without editing the function, by updating parseUri.options.key. (New)
Exceptionally lightweight (1 KB before minification or gzipping).
Released under the MIT License.

Details:

Older versions of this function used what's now called loose parsing mode (which is still the default in this version). Loose mode deviates slightly from the official generic URI spec (RFC 3986), but by doing so allows the function to split URIs in a way that most end users would expect intuitively. However, the finer details of loose mode preclude it from properly handling relative paths which do not start from root (e.g., "../file.html" or "dir/file.html"). On the other hand, strict mode attempts to split URIs according to RFC 3986. Specifically, in loose mode, directories don't need to end with a slash (e.g., the "dir" in "/dir?query" is treated as a directory rather than a file name), and the URI can start with an authority without being preceded by "//" (which means that the "yahoo.com" in "yahoo.com/search/" is treated as the host, rather than part of the directory path).

Since I've assumed that most developers will consistently want to use one mode or the other, the parsing mode is not specified as an argument when running parseUri, but rather as a property of the parseUri function itself. Simply run the following line of code to switch to strict mode:

parseUri.options.strictMode = true;

From that point forward, parseUri will work in strict mode (until you turn it back off).

The code:

// parseUri 1.2.2
// (c) Steven Levithan <stevenlevithan.com>
// MIT License

function parseUri (str) {
	var	o   = parseUri.options,
		m   = o.parser[o.strictMode ? "strict" : "loose"].exec(str),
		uri = {},
		i   = 14;

	while (i--) uri[o.key[i]] = m[i] || "";

	uri[o.q.name] = {};
	uri[o.key[12]].replace(o.q.parser, function ($0, $1, $2) {
		if ($1) uri[o.q.name][$1] = $2;
	});

	return uri;
};

parseUri.options = {
	strictMode: false,
	key: ["source","protocol","authority","userInfo","user","password","host","port","relative","path","directory","file","query","anchor"],
	q:   {
		name:   "queryKey",
		parser: /(?:^|&)([^&=]*)=?([^&]*)/g
	},
	parser: {
		strict: /^(?:([^:\/?#]+):)?(?:\/\/((?:(([^:@]*)(?::([^:@]*))?)?@)?([^:\/?#]*)(?::(\d*))?))?((((?:[^?#\/]*\/)*)([^?#]*))(?:\?([^#]*))?(?:#(.*))?)/,
		loose:  /^(?:(?![^:@]+:[^:@\/]*@)([^:\/?#.]+):)?(?:\/\/)?((?:(([^:@]*)(?::([^:@]*))?)?@)?([^:\/?#]*)(?::(\d*))?)(((\/(?:[^?#](?![^?#\/]*\.[^?#\/.]+(?:[?#]|$)))*\/?)?([^?#\/]*))(?:\?([^#]*))?(?:#(.*))?)/
	}
};

You can download it here.

parseUri has no dependencies, and has been tested in IE 5.5–7, Firefox 2.0.0.4, Opera 9.21, Safari 3.0.1 beta for Windows, and Swift 0.2.

177 thoughts on “parseUri 1.2: Split URLs in JavaScript”

Matt Ruby says:

June 5, 2009 at 1:17 pm

Thanks again! I’ve made your suggested change and things are still working well.

I look forward to next version.

-Ruby

I did a PHP port of this amazing function.
I added 2 features.
Hope it could be useful !!
Have fun !

LudoO

<?php
/*
	PhpParseUri 1.0

	PHP Port of parseUri 1.2.1
		- added file:///
		- added : windows drive detection
		
	PHP Port: LudoO 2009 <pitaso.com>

	Original JS : (c) 2007 Steven Levithan <stevenlevithan.com>	MIT License
	
*/
function parseUri($str) {
	global $parseUri_options;
	$o   = $parseUri_options;
	$r   = $o['parser'][$o['strictMode'] ? "strict" : "loose"];
	preg_match($r, $str, $m);
	$uri = array();
	$i   = 15;

	while ($i--) $uri[$o['key'][$i]] = $m[$i];

	$uri[$o['q']['name']] = array();
	preg_match_all($o['q']['parser'], $uri[$o['key'][13]], $n);
	if ($n && sizeof($n)>0){
		for ($i = 1; $i <= sizeof($n); $i++) {
			$v =$n[$i];
			if ($v) $uri[$o['q']['name']][$v[0]] = $v[1];
		}
	}
	return $uri;
};
$parseUri_options = array(
	strictMode => false,
	key => array("source","protocol","authority","userInfo","user","password","host","port","relative","path","drive","directory","file","query","anchor"),
	q =>   array(
		name =>   "queryKey",
		parser => '/(?:^|&)([^&=]*)=?([^&]*)/'
	),
	parser => array(
		strict => '/^(?:([^:\/?#]+):)?(?:\/\/\/?((?:(([^:@]*):?([^:@]*))?@)?([^:\/?#]*)(?::(\d*))?))?(((?:\/(\w:))?((?:[^?#\/]*\/)*)([^?#]*))(?:\?([^#]*))?(?:#(.*))?)/',
		loose =>  '/^(?:(?![^:@]+:[^:@\/]*@)([^:\/?#.]+):)?(?:\/\/\/?)?((?:(([^:@]*):?([^:@]*))?@)?([^:\/?#]*)(?::(\d*))?)(((?:\/(\w:))?(\/(?:[^?#](?![^?#\/]*\.[^?#\/.]+(?:[?#]|$)))*\/?)?([^?#\/]*))(?:\?([^#]*))?(?:#(.*))?)/'
	)
);
?>

Pingback: Parsing URLs in Javascript | Pressing the Red Button
j says:

July 25, 2009 at 3:48 pm

this function does not work for urls with ipv6 addresses in them

an example url:
http://%5B2001:4860:a003::68]/search?q=parseUri+ipv6
Jeff Jackson says:

September 17, 2009 at 10:59 pm

Thanks for this! One small issue: if a URL has multiple instances of the same parameter name (as can occur if multiple checkboxes are checked on a form) then queryKey will only contain the last value associated with this name. So if the query string is

x=a&x=b

then queryKey.x ends up as ‘b’, and the x=a pair is lost. It might be nicer if, say, an array of strings rather than a single string was assigned to the corresponding element of queryKey in such cases.
Ildar says:

October 2, 2009 at 4:16 am

Hi Steven,

Once, i have implemented the URI parser in JavaScript. It looks like the method of String prototype, so you are able to parse input strings in JavaScript-style:

var url = ‘https://blog.stevenlevithan.com/archives/parseuri’;
var url_parsed = url.parseUrl();

The source is located here:
http://with-love-from-siberia.blogspot.com/2009/07/url-parsing-in-javascript.html
Pingback: Trial and Erorr | Backseat Surfer
Mike says:

January 29, 2010 at 8:38 pm

I was using your parseUri class in JavaScript and noticed that the path “file.ext” is not considered a file by your script. It looks like it is expecting a preceding slash. I don’t know if this is a bug or just a feature that was left out. My work around was just some extra conditioning tests. Nice work on this class and thanks for posting it.
Steven Levithan says:

February 3, 2010 at 3:34 am

@Mike, in non-strict mode, a URL starting with a filename (i.e., no preceding slash) is treated as a host, for reasons explained elsewhere. Switch to strict mode and you should be OK.
Carter Cole says:

February 10, 2010 at 5:53 pm

your code is AWESOME! thanks for your stuff… i stole your dump function too 🙂 ive been looking for something like it for awhile
Anup Chatterjee says:

April 22, 2010 at 8:44 pm

Great and simple parser. But only one issue I have here – the open social type of url links do not work with your parser, e.g.
https://blog.stevenlevithan.com/user/@me/folder/@root
Pingback: Golingo: a great Titanium mobile Web game, open sourced for you | Intertech
Scott says:

May 27, 2010 at 1:59 pm

Hi Steven,

Thanks for the great code! If you’re still maintaining it, here’s an example of a url that has issues:

http://www.contemporaryartdaily.com/wp-content/uploads/2010/05/4.-AC-2010-InstallShot@Maccarone-NorthGallery-150×150.jpg
Steven Levithan says:

May 28, 2010 at 1:50 am

@Scott, that URL is invalid. The @ sign is supposed to be URL encoded as “%40”–if you make this change, parseUri will handle it fine.

FYI, you’re not the first person to request different handling for invalid uses of “@” (see Anup Chatterjee and Andrew Zitnay’s comments here), so perhaps it’s worth looking at changes to “loose” parsing mode, at least, in future versions of this script.
James says:

June 2, 2010 at 9:11 am

Awesome code. Ultimate Regex.
I found this letter >a< knocking around, which i present to you for insertion into the appropriate position in your surname.
Brandon Sterne says:

June 11, 2010 at 6:12 pm

Do you think it’s a bug that jQuery.url.setUrl(“script.js”).attr(“host”) returns script.js?

PHP’s parse_url, for example, is able to parse that string as the path.
Brandon Sterne says:

June 11, 2010 at 6:28 pm

Sorry, strict mode seems to address that, but the jQuery plugin doesn’t seem to use strict mode properly. I will investigate on that end. Thanks for the great parser!
Neeta says:

July 2, 2010 at 6:11 pm

Nice Function. It works 99% of the times..
I tried it with google search results url as given below and it broke.. query is null!!

http://www.google.com/#hl=en&source=hp&q=dvd+player&aq=f&aqi=g10&aql=&oq=&gs_rfai=CHmF57nEuTJ-4E5-yMcr6vYgKAAAAqgQFT9CPQHY&fp=7e78d8b98f604090
Steven Levithan says:

July 4, 2010 at 1:14 am

@James, lol.

@Neeta, parseUri returns the correct result. Everything after the # sign is the URI fragment (aka anchor). There is no query part.
Rien Broekstra says:

July 13, 2010 at 2:42 pm

Dear Mr. Levithan, hello Steven,

Thanks a lot for your javascript magic.

I have a feature suggestion if that’s appropriate. Since parseUri is returning an object, would it be a decent idea to add a method which would reassemble the URL back to a string from its parts?

That could be quite handy for actually manipulating URL’s. One could then alter parts of the URL (add or modify parts of the query, the user information, whatsoever) without doing any regexp magic in their own code.

Cheers,
—
Rien
Johann says:

July 17, 2010 at 12:16 pm

@Rien,

there is a source property in the returned object that contains the original string.

Thanks Steven for this script, I use it together with a Punycode encoder on some proxies to support IDN domains and parseUri has helped a lot keeping the code small.
Leechael says:

August 12, 2010 at 3:24 am

It break on URL like http://www.blahblah.com/@foo/bar ….
Adam says:

September 1, 2010 at 10:05 am

Very nice! It does break on the URL http://www.blahblah.com/@foo/bar as Leechael noted, but still, for how simple it is, I am duly impressed. My MUCH longer version of a strict RFC-3986 parser written in C is over at github (http://github.com/ajrisi/fsm). I didn’t use regex like you, I used a hand-rolled finite state machine. The output isn’t quite as readable as yours either. Still, might be worth something to someone!
Pingback: Parsowanie URL w JavaScript at Jakub Laskowski
ridgerunner says:

November 11, 2010 at 6:54 pm

Actually, the ‘@’ sign is a perfectly valid character for the path, query and fragment portions of a URI according to RFC3986 and does not need to be encoded as ‘%40’.

Look at the ABNF definition for ‘pchar’ in Appendix A of RFC3986.
Steven Levithan says:

November 12, 2010 at 1:31 am

@ridgerunner, thanks for the details. I will correct for that in future versions on this script.
User says:

February 1, 2011 at 6:16 am

Just wanna say: AWESOME WORK!
Jens Weiermann says:

April 1, 2011 at 2:03 am

Hi,

I’ve taken Robert’s idea of creating an array of values for parameters that were given multiple times. Unlike him, I did so without using a third party library. Here’s the code if anyone’s interested:

Replace the line

if ($1) uri[o.q.name][$1] = $2;

with

if ($1) {
if (uri[o.q.name][$1] === undefined) {
uri[o.q.name][$1] = $2;
} else if (typeof uri[o.q.name][$1] === ‘[object Array]’) {
uri[o.q.name][$1].push($2);
} else if (typeof uri[o.q.name][$1] === ‘string’) {
uri[o.q.name][$1] = [ uri[o.q.name][$1], $2];
}
}
Pingback: javascript?URL???????????parseUri? « kawama.jp
Pingback: MikeCann.co.uk » Blog Archive » URI Parser For HaXe
jojo says:

May 7, 2011 at 6:08 pm

great stuff.
i just love regexp.
very cool
Brad says:

May 25, 2011 at 1:08 pm

Evil corner case >:)
http://www.test.com/path?__proto__=1
Pingback: Javascript URI parser | Jeff Wang's Blog
Niall Smart says:

July 16, 2011 at 12:22 pm

If you need to go the other way (i.e., object spec to URI string), makeUri() can help:

https://gist.github.com/1073037
Yaffle says:

July 18, 2011 at 3:17 am

javascript absolutize URL : https://gist.github.com/1088850
Pingback: jquery url parser????????? | ????????
Adam says:

July 27, 2011 at 7:02 am

Can you remove the maxlength on the input box? Thanks.
84 says:

September 5, 2011 at 10:06 pm

Hello…nice site…https://blog.stevenlevithan.com/archives/parseuri is The Best! Please keep it up webmaster….great job…thumbs up!
hepsignman says:

October 3, 2011 at 8:48 am

Newbie question.
We have a job application form that uses the document.referrer to identify which job they are applying for. And, I want to add this info into the subject line of the email sent to our hr person. I came across your code that will parse the url.
how do I take what your parser produces so I can add it to the subject line?
sosoflickr says:

December 26, 2011 at 3:14 am

great stuff.
thanks!
Norbert Klasen says:

January 12, 2012 at 3:49 am

Hi,

I’ve enhanced the loose mode a bit to support and tokenize literal IPv4 and IPv6 addresses as well as splitting an FQDN into hostname and domain.

Thanks
Norbert

String.prototype.parseUri = function() {
var o = String.prototype.parseUri.options;
var m = o.parser.ipv6.exec(this);
var uri = {};
var i = 18;
while (i–)
uri[o.key[i]] = m[i] || “”;

uri[o.q.name] = {};
uri[o.key[16]].replace(o.q.parser, function($0, $1, $2) {
if ($1)
uri[o.q.name][$1] = $2;
});

if (uri.ipv4 != “”) {
uri.ip = uri.ipv4;
}
else
if (uri.ipv6 != “”) {
uri.ip = uri.ipv6;
}

return uri;
};

String.prototype.parseUri.options = {
// strictMode : false,
key : [ “source”, “protocol”, “authority”, “userInfo”, “user”, “password”,
“host”, “ipv4”, “ipv6”, “basename”, “domain”, “port”, “relative”,
“path”, “directory”, “file”, “query”, “anchor” ],
q : {
name : “queryKey”,
parser : /(?:^|&)([^&=]*)=?([^&]*)/g
},
parser : {
ipv6 : /^(?:(?![^:@]+:[^:@\/]*@)([^[:\/?#.]+):)?(?:\/\/)?((?:(([^:@]*)(?::([^:@]*))?)?@)?((?:(\d+\.\d+\.\d+\.\d+)|\[([a-fA-F0-9:]+)\]|([^.:\/?#]*))(?:\.([^:\/?#]*))?)(?::(\d*))?)(((\/(?:[^?#](?![^?#\/]*\.[^?#\/.]+(?:[?#]|$)))*\/?)?([^?#\/]*))(?:\?([^#]*))?(?:#(.*))?)/
}
};
Pablo Pazos says:

February 11, 2012 at 8:19 pm

Hi, great piece of code.

BTW, is there any function for doing the opposite operation? (I mean getting the string URL from the parsed one).

It could be useful when you need to change some URL params on the query string or any part of the URL. (that’s inded what I need to do on a CMS I’m building: http://code.google.com/p/yupp-cms/)

Thanks a lot!
Doug says:

February 16, 2012 at 11:12 am

Steve,

I am trying to use your code with server side script in domino xpages. Everything works great except the parameter value returned is a number instead of the correct string. All other values are returned correctly
Using http://usr:pwd@www.test.com:81/dir/dir.2/index.htm?q1=0&&test1&test2=value#top
uri.queryKey.q1=0
uri.queryKey.test1=5
uri.queryKey.test2=11

Thanks for contributing.
Doug says:

February 16, 2012 at 11:30 am

Steve,

FYI- I used a Domino function to grab from $0 everything to the right of the = sign instead of using $2 and it now returns the correct values. Thanks again for contributing. Great code.
Sean Bannister says:

March 11, 2012 at 1:46 am

The semicolon at the end of the parseUri function isn’t required as mentioned in section 13 of the spec http://ecma262-5.com/ELS5_HTML.htm#Section_13
?? ???? says:

April 7, 2012 at 10:36 pm

Nice post really , Thanks for sharing.
JAY says:

April 9, 2012 at 11:22 am

Hi author i would like to ask how could I use this code as javascipt that could display incoming search term to my site. Its like when someone searches in google and arrives to my site, the url of the referrer is parsed and displayed in my website as ” incoming search term (keyword)” ? any suggestion about making it in javascipt?
Erik Dubbelboer says:

April 17, 2012 at 4:54 am

Hi Steven,

As some others have pointed out already your code doesn’t work for urls with a @ in the path or query part.

For example http://www.adperium.com/campaigns/example@example.com/93f92b1c will return example.com as the host.

According to rfc 3986 section-3.3, @ is a valid path character.

Since adblock for chrome uses your code to parse urls it currently blocks parts of sites that shouldn’t be blocked.
Pingback: JavaScript or Query library to work with paths/URIs | Easy jQuery | Free Popular Tips Tricks Plugins API Javascript and Themes
Francis Cagney says:

July 9, 2012 at 5:43 pm

This was just what I wanted except: I’m passing some javascript code with spaces and funny chars in the string. So I wrote a little extension to decode these:

So I replaced the line:
if ($1) uri[o.q.name][$1] = $2
with
if ($1) uri[o.q.name][$1] = $2.replace(o.q.decode, function ($3, $4) {
return ($4) ? String.fromCharCode (parseInt($4, 16)) : ” “;
});

and added this to the q structure.
decode: /(?:\+)|(?:%(..))/g

Edit (2024): parseUri has had a major update and is now available on GitHub and npm

Highlights:

Details:

The code:

177 thoughts on “parseUri 1.2: Split URLs in JavaScript”

Leave a Reply