Flagrant Badassery

A JavaScript and regular expression centric blog

JavaScript split Bugs: Fixed!

The String.prototype.split method is very handy, so it's a shame that if you use a regular expression as its delimiter, the results can be so wildly different cross-browser that odds are you've just introduced bugs into your code (unless you know precisely what kind of data you're working with and are able to avoid the issues). Here's one example of other people venting about the problems. Following are the inconsistencies cross-browser when using regexes with split:

  • Internet Explorer excludes almost all empty values from the resulting array (e.g., when two delimiters appear next to each other in the data, or when a delimiter appears at the start or end of the data). This doesn't make any sense to me, since IE does include empty values when using a string as the delimiter.
  • Internet Explorer and Safari do not splice the values of capturing parentheses into the returned array (this functionality can be useful with simple parsers, etc.)
  • Firefox does not splice undefined values into the returned array as the result of non-participating capturing groups.
  • Internet Explorer, Firefox, and Safari have various additional edge-case bugs where they do not follow the split specification (which is actually quite complex).

The situation is so bad that I've simply avoided using regex-based splitting in the past.

That ends now. wink

The following script provides a fast, uniform cross-browser implementation of String.prototype.split, and attempts to precisely follow the relevant spec (ECMA-262 v3 §15.5.4.14, pp.103,104).

I've also created a fairly quick and dirty page where you can test the result of more than 50 usages of JavaScript's split method, and quickly compare your browser's results with the correct implementation. On the test page, the pink lines in the third column highlight incorrect results from the native split method. The rightmost column shows the results of the below script. It's all green in every browser I've tested (IE 5.5 – 7, Firefox 2.0.0.4, Opera 9.21, Safari 3.0.1 beta, and Swift 0.2).

Run the tests in your browser.

Here's the script:

/* Cross-Browser Split 1.0.1
(c) Steven Levithan <stevenlevithan.com>; MIT License
An ECMA-compliant, uniform cross-browser split method */

var cbSplit;

// avoid running twice, which would break `cbSplit._nativeSplit`'s reference to the native `split`
if (!cbSplit) {

cbSplit = function (str, separator, limit) {
    // if `separator` is not a regex, use the native `split`
    if (Object.prototype.toString.call(separator) !== "[object RegExp]") {
        return cbSplit._nativeSplit.call(str, separator, limit);
    }

    var output = [],
        lastLastIndex = 0,
        flags = (separator.ignoreCase ? "i" : "") +
                (separator.multiline  ? "m" : "") +
                (separator.sticky     ? "y" : ""),
        separator = RegExp(separator.source, flags + "g"), // make `global` and avoid `lastIndex` issues by working with a copy
        separator2, match, lastIndex, lastLength;

    str = str + ""; // type conversion
    if (!cbSplit._compliantExecNpcg) {
        separator2 = RegExp("^" + separator.source + "$(?!\\s)", flags); // doesn't need /g or /y, but they don't hurt
    }

    /* behavior for `limit`: if it's...
    - `undefined`: no limit.
    - `NaN` or zero: return an empty array.
    - a positive number: use `Math.floor(limit)`.
    - a negative number: no limit.
    - other: type-convert, then use the above rules. */
    if (limit === undefined || +limit < 0) {
        limit = Infinity;
    } else {
        limit = Math.floor(+limit);
        if (!limit) {
            return [];
        }
    }

    while (match = separator.exec(str)) {
        lastIndex = match.index + match[0].length; // `separator.lastIndex` is not reliable cross-browser

        if (lastIndex > lastLastIndex) {
            output.push(str.slice(lastLastIndex, match.index));

            // fix browsers whose `exec` methods don't consistently return `undefined` for nonparticipating capturing groups
            if (!cbSplit._compliantExecNpcg && match.length > 1) {
                match[0].replace(separator2, function () {
                    for (var i = 1; i < arguments.length - 2; i++) {
                        if (arguments[i] === undefined) {
                            match[i] = undefined;
                        }
                    }
                });
            }

            if (match.length > 1 && match.index < str.length) {
                Array.prototype.push.apply(output, match.slice(1));
            }

            lastLength = match[0].length;
            lastLastIndex = lastIndex;

            if (output.length >= limit) {
                break;
            }
        }

        if (separator.lastIndex === match.index) {
            separator.lastIndex++; // avoid an infinite loop
        }
    }

    if (lastLastIndex === str.length) {
        if (lastLength || !separator.test("")) {
            output.push("");
        }
    } else {
        output.push(str.slice(lastLastIndex));
    }

    return output.length > limit ? output.slice(0, limit) : output;
};

cbSplit._compliantExecNpcg = /()??/.exec("")[1] === undefined; // NPCG: nonparticipating capturing group
cbSplit._nativeSplit = String.prototype.split;

} // end `if (!cbSplit)`

// for convenience...
String.prototype.split = function (separator, limit) {
    return cbSplit(this, separator, limit);
};

Download it.

Please let me know if you find any problems. Thanks!

Update: This script has been become part of my XRegExp library, which includes many other JavaScript regular expression cross-browser compatibility fixes.

There Are 37 Responses So Far. »

  1. yeah so i found your page after wondering why tf a very simple regex was returning different results in Opera and Mozilla – i thought i was going insane, until finding posts like this on your blog – when i see stuff like “runs on cmucl, allegro, sbcl, LispWorks, OpenMCL”, i wonder…what did the LISP guys do that browser guys have such trouble with..

  2. Thanks. I used your script and it saved me a huge headache with IE not treating splits like other browswers do. This script was very well done, and I liked your validation page, also very useful.

  3. [...] Long story short, if you’re running into problems with your split method in any browser, chances are this script fixes it [...]

  4. Hey this code is great, excellent job! I made some optimizations for my particular use cases because I was worried about performance using this implementation versus the native one. One thing I do often is split many (hundreds or thousands) strings with the same RegExp object and your code is reconstructing separator up to two times per split. To make this faster I added some object caching on the separator parameter so it will only reconstruct the regex the first time you split with it. Also, since cross-browser behavior with string separators is consistent I just made it use the native implementation if separator isn’t an instance of RegExp. It still passes your test page with flying colors, though I only tested Firefox 2, Safari 3, and IE6. Drop me a line if you you’d like to check out the changes and possibly absorb them into your copy.

  5. Marcel, I’m interested. I’d already planned to change this to use the native split method for non-regex separators if I ever got around to updating it. As for caching to avoid regex recompilation, some browsers might do that automagically, so I’d be interested in testing exactly how it affects each of the major browsers before making such a change. Finally, I believe my script might fail the test page in KHTML (as opposed to WebKit) -based browsers such as Konqueror. If that’s the case, I’d want to look into how to address that (if at all possible) before re-releasing. I’ll send you an email.

  6. Just want to say thanks for the script, it works perfectly.

  7. Thank you, just, thank you.

  8. Thank you very much for this work. This keeps my simple cross browser project simple.

  9. Dude,

    You saved my bacon with this one! Been fighting this for a couple of days and ran across your script this morning. Fired it off and BAM! worked the first time with a RegExp that worked great in FireFox but was tanking in IE.

    Thanks again.

  10. I’m happy to hear that this has helped you all!

    I’ve just modified the script to use the native split method when non-regex separators are provided, in order to run a little faster in such cases. No other significant changes were made.

  11. I’d replace:
    var nativeSplit = nativeSplit || String.prototype.split;
    for
    String.prototype._split = String.prototype._split || String.prototype.split;

    So you don’t pollute the window with globals..

    Hope that helps

  12. @Ariel Flesler:

    Moving the namespace pollution from the window object to the String.prototype object (which is also available globally) makes things worse, IMO. And while you could wrap all of the code in an anonymous function to avoid adding any global variables, I think there is some benefit to keeping the native version available to other code, for testing purposes if nothing else. As for the name “_split”, I intentionally avoided that because I think it’s more likely to collide with other libraries which might do something similar.

    For the record, the reason I do nativeSplit = nativeSplit || String.prototype.split instead of just nativeSplit = String.prototype.split is because otherwise, running the code twice would break the reference to the native global.

  13. Have you considered trying to get this implemented in one of the framework libraries?

  14. Well, it’s out there, and MIT licensed. Other libraries are welcome to use it if they’d like to. Incidentally, a slightly modified version of this code will be included in the next version of my XRegExp library.

  15. Can you give some examples of how you call this? It is just not clear how to implement.

    Many Thanks

  16. @Dale, it just overrides the native split method, so you can use it as simply as something like this:

    var numbers = "1:2:3".split(/:/);
    // -> ["1","2","3"]

    or…

    var numbers = "1:2:3".split(/(:)/);
    // -> ["1",":","2",":","3"]

    Refer to the Mozilla Developer Center for more info.

  17. [...] expressions as delimiters. Other people have had this problem before and Steven Levithan has a nice article about the topic including a script which fixes the inconsistencies between the different browsers, [...]

  18. Afternoon Steve,
    Thanks for this great script! Spent a few ours trying to get a split and regex working properly… after many google searches I found this site. I linked your script, used the proper syntax and *POOF it worked precisely as I needed it to!

    Thanks man!!

    You dont have a donate box, otherwise I would donate some $$ for your efforts!

  19. @Tim Lavelle, thanks!

    I’ve just upgraded this script to v0.3. Hallvord Steen of Opera helped me spot an issue with the previous version. When using String.prototype.split, if the last match of the separator within the subject string ended at the end of the string, and the separator was capable of matching an empty string (e.g. with /a?/), a trailing empty string value was not appended to the result array even when the separator did not match an empty string in that last case. This followed Firefox’s native handling, but not the spec (which at least Opera follows correctly).

    The new version of the script fixes this error. I’ve also updated the test page accordingly.

  20. Please forgive me for my ignorance, but how do I use this script and call the function? It looks like it’ll solve all my problems, but I’m missing how to use this to “split” out my variable?

    Thanks.

  21. i really appreciate it!!

  22. I am trying to filter the values from an a textarea and pass it into a new line of array so that it populate the of a box. This works fine on mozilla and IE 7.0 but with lower version of IE it wouldnt work , rather it just selects the first line and rejects the rest.

    This is the content of the textarea , I want to make it check for new lines and split with \n then add to an array

    9845747594
    4545454545
    5454656565
    6565656566

    function OnClickAddNumber()
    {
    	var	strValue	= document.SMS.Number.value;
    	var separator = '\n';
    	var strValue_array = strValue.split(separator);
    	var len = strValue_array.length;
    
    	var i = 0;
    	var mumu=1;
    
    		for(var i=0; i 500)
    			{
    				alert("You Tried forwarding " + len + " Numbers to the destination box \n But only  500 Number was accepted. \n Its a Rule not an error Okay" )
    				return;
    			}
    			var	iCurNode	= 0;
    			var	oNode		= null;
    			if (strValue_array[i].length < 4 )
    				return;
    			strValue_array[i] += " (Number)"
    			oNode = document.createElement("option");
    			if (oNode == null)
    				return;
    			oNode.appendChild(document.createTextNode(strValue_array[i]));
    			for (var iCurNode = 0; iCurNode < document.SMS.Recipients.length; iCurNode++)
    			{
    				if (strValue_array[i].toLowerCase() = document.SMS.Recipients.length || document.SMS.Recipients.length == 0)
    			{
    				document.SMS.Recipients.appendChild(oNode);
    				document.SMS.Recipients.options[(document.SMS.Recipients.length-1)].selected = true;
    			}
    			mumu+=1;
    		}
    
    	//document.SMS.Number.value = "";
    }
  23. [...] http://blog.stevenlevithan.com/archives/cross-browser-split [...]

  24. Please note that this script fails (in IE of course) if the split is done on a ~

    It runs native as it’s not an instance of RegEx and adding an exception for ~ causes IE to spit out thousands of splits instead of 2 and causes FireFox’s to fail completely (though it was working fine prior). I’m going to spend some time with it, see if I can’t figure out the problem. I’ll let you know.

  25. @JMJimmy, I can’t reproduce the issue in IE8. Which version of IE are you using? Can you provide a script to reproduce the issue?

    As you mentioned, if you split on matches of the string “~” this code will just pass the handling off to the native String.prototype.split. So, if there is a problem, it’s likely an issue that would occur in IE anyway.

  26. Thanks, good script!
    Why you use concat method of Array? using “push” instead may improve performance a little.

  27. I’ve just updated this script from version 0.3 to 1.0. The new version includes significant refactoring, and fixes a bug where the limit argument was not always followed consistently.

  28. I really, truly cannot thank you enough for this. Of all the bullshit we have to put up with in Javascript, rewriting String.split must be up there with the worst. :)

    Seriously, I owe you a pint, you just saved me a few hours… :)

    David

  29. Carmen, it’s not just us Lisp guys, though we do like to brag about it. Most C libraries will compile on all the major C compilers (GNU, Microsoft, Intel, etc.) as well. And it’s not because of gratuitous CPP macros, either: Plan 9 builds on all platforms without any #if/#ifdef at all (and in fact the native CPP doesn’t even *have* #if).

    What’s left to say? People who write web browsers are really creative — they found ways for things to break that nobody in 50 years of computing had thought of. :-)

  30. Thanks, it’s a must have

  31. Saved my bacon – thanks a million for sharing this with us all. I’ve just finished a little Javascript site – http://www.nathaliemiquel-bijoux.fr – that reads a csv table the owner can modify to update the content, and it worked fine in Firefox, Opera, Safari and Chrome, but didn’t even load in IE. Just linked to your split.js file before mine in the header and it works perfectly everywhere!

  32. I know this post is old (although the latest update to the code, according to the comments, was almost 1 year ago), but still, this script saved me from spending the rest of the day trying to figure out why my code doesn’t work (and then to find out that it’s IE’s fault). Thanks a lot!

  33. [...] totusi sa vad daca a mai intalnit cineva acest caz si se pare ca nu am fost singurul ghinionist. Am dat cu ocazia asta si peste o extensie care corecteaza functionalitatea functiei split in mai multe [...]

  34. Great! Great! Great! Your the Man… Thanks a lot

  35. Thanks. Great post. It helped me identify why I had a bug in Chrome.

  36. Thanks for the sanity check (and reliable solution).

    -j

  37. Excellent solution! Top notch! Instantly solved an issue I was having using split() with Firefox.

    Thank you Sir!

Post a Response

If you are about to post code, please escape your HTML entities (&amp;, &gt;, &lt;).