JavaScript split Bugs: Fixed!

The String.prototype.split method is very handy, so it's a shame that if you use a regular expression as its delimiter, the results can be so wildly different cross-browser that odds are you've just introduced bugs into your code (unless you know precisely what kind of data you're working with and are able to avoid the issues). Here's one example of other people venting about the problems. Following are the inconsistencies cross-browser when using regexes with split:

  • Internet Explorer excludes almost all empty values from the resulting array (e.g., when two delimiters appear next to each other in the data, or when a delimiter appears at the start or end of the data). This doesn't make any sense to me, since IE does include empty values when using a string as the delimiter.
  • Internet Explorer and Safari do not splice the values of capturing parentheses into the returned array (this functionality can be useful with simple parsers, etc.)
  • Firefox does not splice undefined values into the returned array as the result of non-participating capturing groups.
  • Internet Explorer, Firefox, and Safari have various additional edge-case bugs where they do not follow the split specification (which is actually quite complex).

The situation is so bad that I've simply avoided using regex-based splitting in the past.

That ends now. wink

The following script provides a fast, uniform cross-browser implementation of String.prototype.split, and attempts to precisely follow the relevant spec (ECMA-262 v3 §15.5.4.14, pp.103,104).

I've also created a fairly quick and dirty page where you can test the result of more than 50 usages of JavaScript's split method, and quickly compare your browser's results with the correct implementation. On the test page, the pink lines in the third column highlight incorrect results from the native split method. The rightmost column shows the results of the below script. It's all green in every browser I've tested (IE 5.5 – 7, Firefox 2.0.0.4, Opera 9.21, Safari 3.0.1 beta, and Swift 0.2).

Run the tests in your browser.

Here's the script:

/*!
 * Cross-Browser Split 1.1.1
 * Copyright 2007-2012 Steven Levithan <stevenlevithan.com>
 * Available under the MIT License
 * ECMAScript compliant, uniform cross-browser split method
 */

/**
 * Splits a string into an array of strings using a regex or string separator. Matches of the
 * separator are not included in the result array. However, if `separator` is a regex that contains
 * capturing groups, backreferences are spliced into the result each time `separator` is matched.
 * Fixes browser bugs compared to the native `String.prototype.split` and can be used reliably
 * cross-browser.
 * @param {String} str String to split.
 * @param {RegExp|String} separator Regex or string to use for separating the string.
 * @param {Number} [limit] Maximum number of items to include in the result array.
 * @returns {Array} Array of substrings.
 * @example
 *
 * // Basic use
 * split('a b c d', ' ');
 * // -> ['a', 'b', 'c', 'd']
 *
 * // With limit
 * split('a b c d', ' ', 2);
 * // -> ['a', 'b']
 *
 * // Backreferences in result array
 * split('..word1 word2..', /([a-z]+)(\d+)/i);
 * // -> ['..', 'word', '1', ' ', 'word', '2', '..']
 */
var split;

// Avoid running twice; that would break the `nativeSplit` reference
split = split || function (undef) {

    var nativeSplit = String.prototype.split,
        compliantExecNpcg = /()??/.exec("")[1] === undef, // NPCG: nonparticipating capturing group
        self;

    self = function (str, separator, limit) {
        // If `separator` is not a regex, use `nativeSplit`
        if (Object.prototype.toString.call(separator) !== "[object RegExp]") {
            return nativeSplit.call(str, separator, limit);
        }
        var output = [],
            flags = (separator.ignoreCase ? "i" : "") +
                    (separator.multiline  ? "m" : "") +
                    (separator.extended   ? "x" : "") + // Proposed for ES6
                    (separator.sticky     ? "y" : ""), // Firefox 3+
            lastLastIndex = 0,
            // Make `global` and avoid `lastIndex` issues by working with a copy
            separator = new RegExp(separator.source, flags + "g"),
            separator2, match, lastIndex, lastLength;
        str += ""; // Type-convert
        if (!compliantExecNpcg) {
            // Doesn't need flags gy, but they don't hurt
            separator2 = new RegExp("^" + separator.source + "$(?!\\s)", flags);
        }
        /* Values for `limit`, per the spec:
         * If undefined: 4294967295 // Math.pow(2, 32) - 1
         * If 0, Infinity, or NaN: 0
         * If positive number: limit = Math.floor(limit); if (limit > 4294967295) limit -= 4294967296;
         * If negative number: 4294967296 - Math.floor(Math.abs(limit))
         * If other: Type-convert, then use the above rules
         */
        limit = limit === undef ?
            -1 >>> 0 : // Math.pow(2, 32) - 1
            limit >>> 0; // ToUint32(limit)
        while (match = separator.exec(str)) {
            // `separator.lastIndex` is not reliable cross-browser
            lastIndex = match.index + match[0].length;
            if (lastIndex > lastLastIndex) {
                output.push(str.slice(lastLastIndex, match.index));
                // Fix browsers whose `exec` methods don't consistently return `undefined` for
                // nonparticipating capturing groups
                if (!compliantExecNpcg && match.length > 1) {
                    match[0].replace(separator2, function () {
                        for (var i = 1; i < arguments.length - 2; i++) {
                            if (arguments[i] === undef) {
                                match[i] = undef;
                            }
                        }
                    });
                }
                if (match.length > 1 && match.index < str.length) {
                    Array.prototype.push.apply(output, match.slice(1));
                }
                lastLength = match[0].length;
                lastLastIndex = lastIndex;
                if (output.length >= limit) {
                    break;
                }
            }
            if (separator.lastIndex === match.index) {
                separator.lastIndex++; // Avoid an infinite loop
            }
        }
        if (lastLastIndex === str.length) {
            if (lastLength || !separator.test("")) {
                output.push("");
            }
        } else {
            output.push(str.slice(lastLastIndex));
        }
        return output.length > limit ? output.slice(0, limit) : output;
    };

    // For convenience
    String.prototype.split = function (separator, limit) {
        return self(this, separator, limit);
    };

    return self;

}();

Download it.

Please let me know if you find any problems. Thanks!

Update: This script has become part of my XRegExp library, which includes many other JavaScript regular expression cross-browser compatibility fixes.

81 thoughts on “JavaScript split Bugs: Fixed!”

  1. Thanks, good script!
    Why you use concat method of Array? using “push” instead may improve performance a little.

  2. I’ve just updated this script from version 0.3 to 1.0. The new version includes significant refactoring, and fixes a bug where the limit argument was not always followed consistently.

  3. I really, truly cannot thank you enough for this. Of all the bullshit we have to put up with in Javascript, rewriting String.split must be up there with the worst. 🙂

    Seriously, I owe you a pint, you just saved me a few hours… 🙂

    David

  4. Carmen, it’s not just us Lisp guys, though we do like to brag about it. Most C libraries will compile on all the major C compilers (GNU, Microsoft, Intel, etc.) as well. And it’s not because of gratuitous CPP macros, either: Plan 9 builds on all platforms without any #if/#ifdef at all (and in fact the native CPP doesn’t even *have* #if).

    What’s left to say? People who write web browsers are really creative — they found ways for things to break that nobody in 50 years of computing had thought of. 🙂

  5. Saved my bacon – thanks a million for sharing this with us all. I’ve just finished a little Javascript site – http://www.nathaliemiquel-bijoux.fr – that reads a csv table the owner can modify to update the content, and it worked fine in Firefox, Opera, Safari and Chrome, but didn’t even load in IE. Just linked to your split.js file before mine in the header and it works perfectly everywhere!

  6. I know this post is old (although the latest update to the code, according to the comments, was almost 1 year ago), but still, this script saved me from spending the rest of the day trying to figure out why my code doesn’t work (and then to find out that it’s IE’s fault). Thanks a lot!

  7. Excellent solution! Top notch! Instantly solved an issue I was having using split() with Firefox.

    Thank you Sir!

  8. This is awesome. Works great on fixing “split” (which was my immediate issue), and I wonder how many other incompatibilities I’m never even going to see now that I’ve dropped in your script. Thank you!

  9. Seriously this script is amazing – saved me such a headache – simply attached to my document and my reg ex split magically worked in ie – que fist pump and virtual high 5 !

  10. Thanks for this code — awesome!

    You probably already know this, but I note that IE9 produces correct results on every test. Compatibility mode produces the ‘correct’ incorrest results, as well.

    Not much use to us while there are still so many non-compliant browsers out there, but kudos to Microsoft where it’s due (for once).

  11. Wow, I was just running into compatibility issues between IE and chrome (chrome passes all the tests correctly). I was dreading having to write my own split, and you’ve already done it brilliantly. Thank you so much.

Leave a Reply

Your email address will not be published. Required fields are marked *