Flagrant Badassery

A JavaScript and regular expression centric blog

JavaScript split Bugs: Fixed!

The String.prototype.split method is very handy, so it's a shame that if you use a regular expression as its delimiter, the results can be so wildly different cross-browser that odds are you've just introduced bugs into your code (unless you know precisely what kind of data you're working with and are able to avoid the issues). Here's one example of other people venting about the problems. Following are the inconsistencies cross-browser when using regexes with split:

  • Internet Explorer excludes almost all empty values from the resulting array (e.g., when two delimiters appear next to each other in the data, or when a delimiter appears at the start or end of the data). This doesn't make any sense to me, since IE does include empty values when using a string as the delimiter.
  • Internet Explorer and Safari do not splice the values of capturing parentheses into the returned array (this functionality can be useful with simple parsers, etc.)
  • Firefox does not splice undefined values into the returned array as the result of non-participating capturing groups.
  • Internet Explorer, Firefox, and Safari have various additional edge-case bugs where they do not follow the split specification (which is actually quite complex).

The situation is so bad that I've simply avoided using regex-based splitting in the past.

That ends now. wink

The following script provides a fast, uniform cross-browser implementation of String.prototype.split, and attempts to precisely follow the relevant spec (ECMA-262 v3 §15.5.4.14, pp.103,104).

I've also created a fairly quick and dirty page where you can test the result of more than 50 usages of JavaScript's split method, and quickly compare your browser's results with the correct implementation. On the test page, the pink lines in the third column highlight incorrect results from the native split method. The rightmost column shows the results of the below script. It's all green in every browser I've tested (IE 5.5 – 7, Firefox 2.0.0.4, Opera 9.21, Safari 3.0.1 beta, and Swift 0.2).

Run the tests in your browser.

Here's the script:

/*!
 * Cross-Browser Split 1.1.1
 * Copyright 2007-2012 Steven Levithan <stevenlevithan.com>
 * Available under the MIT License
 * ECMAScript compliant, uniform cross-browser split method
 */

/**
 * Splits a string into an array of strings using a regex or string separator. Matches of the
 * separator are not included in the result array. However, if `separator` is a regex that contains
 * capturing groups, backreferences are spliced into the result each time `separator` is matched.
 * Fixes browser bugs compared to the native `String.prototype.split` and can be used reliably
 * cross-browser.
 * @param {String} str String to split.
 * @param {RegExp|String} separator Regex or string to use for separating the string.
 * @param {Number} [limit] Maximum number of items to include in the result array.
 * @returns {Array} Array of substrings.
 * @example
 *
 * // Basic use
 * split('a b c d', ' ');
 * // -> ['a', 'b', 'c', 'd']
 *
 * // With limit
 * split('a b c d', ' ', 2);
 * // -> ['a', 'b']
 *
 * // Backreferences in result array
 * split('..word1 word2..', /([a-z]+)(\d+)/i);
 * // -> ['..', 'word', '1', ' ', 'word', '2', '..']
 */
var split;

// Avoid running twice; that would break the `nativeSplit` reference
split = split || function (undef) {

    var nativeSplit = String.prototype.split,
        compliantExecNpcg = /()??/.exec("")[1] === undef, // NPCG: nonparticipating capturing group
        self;

    self = function (str, separator, limit) {
        // If `separator` is not a regex, use `nativeSplit`
        if (Object.prototype.toString.call(separator) !== "[object RegExp]") {
            return nativeSplit.call(str, separator, limit);
        }
        var output = [],
            flags = (separator.ignoreCase ? "i" : "") +
                    (separator.multiline  ? "m" : "") +
                    (separator.extended   ? "x" : "") + // Proposed for ES6
                    (separator.sticky     ? "y" : ""), // Firefox 3+
            lastLastIndex = 0,
            // Make `global` and avoid `lastIndex` issues by working with a copy
            separator = new RegExp(separator.source, flags + "g"),
            separator2, match, lastIndex, lastLength;
        str += ""; // Type-convert
        if (!compliantExecNpcg) {
            // Doesn't need flags gy, but they don't hurt
            separator2 = new RegExp("^" + separator.source + "$(?!\\s)", flags);
        }
        /* Values for `limit`, per the spec:
         * If undefined: 4294967295 // Math.pow(2, 32) - 1
         * If 0, Infinity, or NaN: 0
         * If positive number: limit = Math.floor(limit); if (limit > 4294967295) limit -= 4294967296;
         * If negative number: 4294967296 - Math.floor(Math.abs(limit))
         * If other: Type-convert, then use the above rules
         */
        limit = limit === undef ?
            -1 >>> 0 : // Math.pow(2, 32) - 1
            limit >>> 0; // ToUint32(limit)
        while (match = separator.exec(str)) {
            // `separator.lastIndex` is not reliable cross-browser
            lastIndex = match.index + match[0].length;
            if (lastIndex > lastLastIndex) {
                output.push(str.slice(lastLastIndex, match.index));
                // Fix browsers whose `exec` methods don't consistently return `undefined` for
                // nonparticipating capturing groups
                if (!compliantExecNpcg && match.length > 1) {
                    match[0].replace(separator2, function () {
                        for (var i = 1; i < arguments.length - 2; i++) {
                            if (arguments[i] === undef) {
                                match[i] = undef;
                            }
                        }
                    });
                }
                if (match.length > 1 && match.index < str.length) {
                    Array.prototype.push.apply(output, match.slice(1));
                }
                lastLength = match[0].length;
                lastLastIndex = lastIndex;
                if (output.length >= limit) {
                    break;
                }
            }
            if (separator.lastIndex === match.index) {
                separator.lastIndex++; // Avoid an infinite loop
            }
        }
        if (lastLastIndex === str.length) {
            if (lastLength || !separator.test("")) {
                output.push("");
            }
        } else {
            output.push(str.slice(lastLastIndex));
        }
        return output.length > limit ? output.slice(0, limit) : output;
    };

    // For convenience
    String.prototype.split = function (separator, limit) {
        return self(this, separator, limit);
    };

    return self;

}();

Download it.

Please let me know if you find any problems. Thanks!

Update: This script has become part of my XRegExp library, which includes many other JavaScript regular expression cross-browser compatibility fixes.

There Are 77 Responses So Far. »

  1. […] JavaScript split Bugs: Fixed! I’ve also created a fairly quick and dirty page where you can test the result of more than 50 usages of JavaScript’s split method, and quickly … […]

  2. Do you have fix for Regexp exec method on IE? It returns empty string on a failed match instead of undefined.

  3. Never mind XRegExp has solved it all. Thanks for the great regexp tool.

  4. […] reguläre Ausdruck bei zwei Methoden so unterschiedlich? Steven Levithan beschreibt in seinem Artikel JavaScript split Bugs: Fixed! gleich eine ganze Reihe von Fehlern (die nicht nur den Internet Explorer betreffen), und liefert […]

  5. […] This script helped deal with empty csv values in IE: […]

  6. Beautiful! Thank you!

  7. Thanks for this, it is just what I needed. I was splitting a date range string like so: “my_field:[2012-01-03T00:00:00Z TO 2012-01-24T23:59:59.999Z]” on the first occurrence of a colon with /\:(.*)?/ and was working in non-IE browsers. IE was simply splitting on the first colon and discarding the rest of the string.

    cbSplit fixes this and I am eternally grateful.

    Alex

  8. Thank you for your great job!
    I ported it to CoffeeScript.
    https://gist.github.com/2015450

  9. @Tsutomu Kawamura, cool. 🙂

    I’ve just upgraded this script from v1.0.1 to v1.1. This fixes how the script handles very large numbers provided as the limit argument (e.g., Infinity or Math.pow(2,32)+1). They are now converted to very small numbers, per the spec rules. (Issue reported by Brian O.)

    I’ve also changed the function name from cbSplit to split.

  10. […] [Edit] From Steven Levithan’s Blog: […]

  11. Who and where to arrange this summer on festival, share your information.

  12. OK. This works! Great job.

    This is the first time I’ve ever encountered a Javascript bug. I’ve dealt with so many cross-browser inconsistencies arising from HTML and CSS that I honestly believed they were the only kind out there. Nary a single Javascript bug.

    Now I have to Google “Javascript bugs” and find out all the unpleasantries I’ve been missing out on 🙁

    PS: To anybody with a useful site that serves that purpose, please provide a link

  13. Thank you dude. good job!!

  14. Hi! Thanks for the script.
    I have a bug though in IE8. Im doing a background position animation.
    It splits background-position in x and y.
    — preview —
    x = ele.css(‘background-position’),
    y = x.split(‘ ‘),
    z = parseInt(y[1].replace(/px/, ”))+157;

    You’re script fails in helping me solve this. Do you know why?

  15. @Xander, your example code isn’t helpful in determining what the issue with the split is, if any. All that is relevant is the actual value of the target (the value of your x variable, which is not shown), the separator, the optional limit argument, and the output (again, not shown in your code). Also note that to use the latest version of this function, you need to call the global split function itself, not the native split method of strings (which is not overridden). I.e., you should call split(x, ' '). Finally, note that the split function simply passes to the native String.prototype.split method if the separator is not a regex.

  16. woo???Thank you very much?It works?My program could work in IE?

  17. Amazing piece of work!!

  18. […]         ??????????????????JavaScript???split???????????????????????????????IE????????????????http://blog.stevenlevithan.com/archives/cross-browser-spli ?????????????? […]

  19. Hello!

    I’m getting following issue on IE9:

    SCRIPT5007: String.prototype.split: ‘this’ is null or undefined
    test.js, line 44 character 13

    Line 44 looks like this:

    return nativeSplit.call(str, separator, limit);

    Works OK on Firefox and Chrome.

  20. Hello!

  21. You might have even looked at the postings in the personals section.
    Also children that love video games do this for the same reason.

    Just remember to keep text toward the blues and
    greens and away from yellows which might end up
    being very hard to read.

  22. Thankyou very much. This saved me time. 🙂

  23. Thank you!!! BTW. The link to download is broken. I just copied the text provided in this page.

  24. Very Nice. It helped. Thanks.

  25. […] All this because of a(nother) buggy implementation in some browsers. Steven Levithan describes a whole series of errors regarding the split functionality (not only concerning the Internet Explorer) and offers […]

  26. […] incorrectly splits strings using a regex, as discussed here. [shakes fist at IE7]. I believe that this is the solution; if you need to support IE7, good […]

  27. Hello,

    I am contributing https://github.com/lautis/uglifier recently.

    May I ask you a question about your split.js?

    Could you upload the split.js to your github or another git repository?

    Because blow module looks using the split.js, with modifying a little bit.
    I want to watch the split.js’s update, and manage your split.js as this module’s submodule.

    https://github.com/lautis/uglifier/blob/master/lib/split.js

    Thanks.

    Best regards,
    Jun Aruga

Post a Response

If you are about to post code, please escape your HTML entities (&amp;, &gt;, &lt;).