Faster JavaScript Trim

Since JavaScript doesn't include a trim method natively, it's included by countless JavaScript libraries – usually as a global function or appended to String.prototype. However, I've never seen an implementation which performs as well as it could, probably because most programmers don't deeply understand or care about regex efficiency issues.

After seeing a particularly bad trim implementation, I decided to do a little research towards finding the most efficient approach. Before getting into the analysis, here are the results:

Method Firefox 2 IE 6
trim1 15ms < 0.5ms
trim2 31ms < 0.5ms
trim3 46ms 31ms
trim4 47ms 46ms
trim5 156ms 1656ms
trim6 172ms 2406ms
trim7 172ms 1640ms
trim8 281ms < 0.5ms
trim9 125ms 78ms
trim10 < 0.5ms < 0.5ms
trim11 < 0.5ms < 0.5ms

Note 1: The comparison is based on trimming the Magna Carta (over 27,600 characters) with a bit of leading and trailing whitespace 20 times on my personal system. However, the data you're trimming can have a major impact on performance, which is detailed below.

Note 2: trim4 and trim6 are the most commonly found in JavaScript libraries today.

Note 3: The aforementioned bad implementation is not included in the comparison, but is shown later.

The analysis

Although there are 11 rows in the table above, they are only the most notable (for various reasons) of about 20 versions I wrote and benchmarked against various types of strings. The following analysis is based on testing in Firefox 2.0.0.4, although I have noted where there are major differences in IE6.

  1. return str.replace(/^\s\s*/, '').replace(/\s\s*$/, '');
    All things considered, this is probably the best all-around approach. Its speed advantage is most notable with long strings — when efficiency matters. The speed is largely due to a number of optimizations internal to JavaScript regex interpreters which the two discrete regexes here trigger. Specifically, the pre-check of required character and start of string anchor optimizations, possibly among others.
  2. return str.replace(/^\s+/, '').replace(/\s+$/, '');
    Very similar to trim1 (above), but a little slower since it doesn't trigger all of the same optimizations.
  3. return str.substring(Math.max(str.search(/\S/), 0), str.search(/\S\s*$/) + 1);
    This is often faster than the following methods, but slower than the above two. Its speed comes from its use of simple, character-index lookups.
  4. return str.replace(/^\s+|\s+$/g, '');
    This commonly thought up approach is easily the most frequently used in JavaScript libraries today. It is generally the fastest implementation of the bunch only when working with short strings which don't include leading or trailing whitespace. This minor advantage is due in part to the initial-character discrimination optimization it triggers. While this is a relatively decent performer, it's slower than the three methods above when working with longer strings, because the top-level alternation prevents a number of optimizations which could otherwise kick in.
  5. str = str.match(/\S+(?:\s+\S+)*/);
    return str ? str[0] : '';

    This is generally the fastest method when working with empty or whitespace-only strings, due to the pre-check of required character optimization it triggers. Note: In IE6, this can be quite slow when working with longer strings.
  6. return str.replace(/^\s*(\S*(\s+\S+)*)\s*$/, '$1');
    This is a relatively common approach, popularized in part by some leading JavaScripters. It's similar in approach (but inferior) to trim8. There's no good reason to use this in JavaScript, especially since it can be very slow in IE6.
  7. return str.replace(/^\s*(\S*(?:\s+\S+)*)\s*$/, '$1');
    The same as trim6, but a bit faster due to the use of a non-capturing group (which doesn't work in IE 5.0 and lower). Again, this can be slow in IE6.
  8. return str.replace(/^\s*((?:[\S\s]*\S)?)\s*$/, '$1');
    This uses a simple, single-pass, greedy approach. In IE6, this is crazy fast! The performance difference indicates that IE has superior optimization for quantification of "any character" tokens.
  9. return str.replace(/^\s*([\S\s]*?)\s*$/, '$1');
    This is generally the fastest with very short strings which contain both non-space characters and edge whitespace. This minor advantage is due to the simple, single-pass, lazy approach it uses. Like trim8, this is significantly faster in IE6 than Firefox 2.

Since I've seen the following additional implementation in one library, I'll include it here as a warning:

return str.replace(/^\s*([\S\s]*)\b\s*$/, '$1');

Although the above is sometimes the fastest method when working with short strings which contain both non-space characters and edge whitespace, it performs very poorly with long strings which contain numerous word boundaries, and it's terrible (!) with long strings comprised of nothing but whitespace, since that triggers an exponentially increasing amount of backtracking. Do not use.

A different endgame

There are two methods in the table at the top of this post which haven't been covered yet. For those, I've used a non-regex and hybrid approach.

After comparing and analyzing all of the above, I wondered how an implementation which used no regular expressions would perform. Here's what I tried:

function trim10 (str) {
	var whitespace = ' \n\r\t\f\x0b\xa0\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u200b\u2028\u2029\u3000';
	for (var i = 0; i < str.length; i++) {
		if (whitespace.indexOf(str.charAt(i)) === -1) {
			str = str.substring(i);
			break;
		}
	}
	for (i = str.length - 1; i >= 0; i--) {
		if (whitespace.indexOf(str.charAt(i)) === -1) {
			str = str.substring(0, i + 1);
			break;
		}
	}
	return whitespace.indexOf(str.charAt(0)) === -1 ? str : '';
}

How does that perform? Well, with long strings which do not contain excessive leading or trailing whitespace, it blows away the competition (except against trim1/2/8 in IE, which are already insanely fast there).

Does that mean regular expressions are slow in Firefox? No, not at all. The issue here is that although regexes are very well suited for trimming leading whitespace, apart from the .NET library (which offers a somewhat-mysterious "backwards matching" mode), they don't really provide a method to jump to the end of a string without even considering previous characters. However, the non-regex-reliant trim10 function does just that, with the second loop working backwards from the end of the string until it finds a non-whitespace character.

Knowing that, what if we created a hybrid implementation which combined a regex's universal efficiency at trimming leading whitespace with the alternative method's speed at removing trailing characters?

function trim11 (str) {
	str = str.replace(/^\s+/, '');
	for (var i = str.length - 1; i >= 0; i--) {
		if (/\S/.test(str.charAt(i))) {
			str = str.substring(0, i + 1);
			break;
		}
	}
	return str;
}

Although the above is a bit slower than trim10 with some strings, it uses significantly less code and is still lightning fast. Plus, with strings which contain a lot of leading whitespace (which includes strings comprised of nothing but whitespace), it's much faster than trim10.

In conclusion…

Since the differences between the implementations cross-browser and when used with different data are both complex and nuanced (none of them are faster than all the others with any data you can throw at it), here are my general recommendations for a trim method:

  • Use trim1 if you want a general-purpose implementation which is fast cross-browser.
  • Use trim11 if you want to handle long strings exceptionally fast in all browsers.

To test all of the above implementations for yourself, try my very rudimentary benchmarking page. Background processing can cause the results to be severely skewed, so run the test a number of times (regardless of how many iterations you specify) and only consider the fastest results (since averaging the cost of background interference is not very enlightening).

As a final note, although some people like to cache regular expressions (e.g. using global variables) so they can be used repeatedly without recompilation, IMO this does not make much sense for a trim method. All of the above regexes are so simple that they typically take no more than a nanosecond to compile. Additionally, some browsers automatically cache the most recently used regexes, so a typical loop which uses trim and doesn't contain a bunch of other regexes might not encounter recompilation anyway.


Edit (2008-02-04): Shortly after posting this I realized trim10/11 could be better written. Several people have also posted improved versions in the comments. Here's what I use now, which takes the trim11-style hybrid approach:

function trim12 (str) {
	var	str = str.replace(/^\s\s*/, ''),
		ws = /\s/,
		i = str.length;
	while (ws.test(str.charAt(--i)));
	return str.slice(0, i + 1);
}

New library: Are you a JavaScript regex master, or want to be? Then you need my fancy XRegExp library. It adds new regex syntax (including named capture and Unicode properties); s, x, and n flags; powerful regex utils; and it fixes pesky browser inconsistencies. Check it out!

181 thoughts on “Faster JavaScript Trim”

  1. Oh, and one more thing about the trim10 function as you have it posted now. IE doesn’t recognize ‘\v’ as a metacharacter in Javascript strings, so a literal ‘v’ ends up in the whitespace string. Try trimming a string starting with ‘v’ – it’ll chop the leading ‘v’ as well. (That one drove me nuts for a full 20 minutes.) Replace it with \x0b and all’s well. ^_^

  2. @Scott Trenda, interesting about IE not interpreting \v as a vertical tab when embedded in a string literal, especially since it is handled correctly in regexes (/\v/.test("\x0b") == true). I’ll fix that in trim10.

    I’ve actually been using something very similar to your trim13 recently. The only (edge case) problem is that browsers interpret \s differently (see JavaScript, Regex, and Unicode, and the test page). I’ve left the very quick and dirty trim10/11 up there since their ugliness seems to have inspired others to improve them.

    As for your observations about ^\s* vs. ^\s+, that depends on the implementation. Another alternative to consider is ^\s\s*. The difference stems from internal optimizations like whether or not a pre-check of required characters is performed, and the relative cost of success vs. failure to match.

  3. The method that GWT uses to translate String.trim():

    public native String trim() /*-{
    var r1 = this.replace(/^(\s*)/, ”);
    var r2 = r1.replace(/\s*$/, ”);
    return r2;
    }-*/;

  4. Hi,

    nice work. But one thing I’m missing is point to different /s handling in browsers.

    ECMAScript specifies \s as [\t\n\v\f\r], Firefox added [\u00A0\u2028\u2029] to the list.

    Opera happens to match   with \s like Firefox. Safari behaves like the IE and doesnt match &nbsp with \s.

    more: http://dev.mootools.net/ticket/646

    I didnt test \u2028 and \u2029, but I think /[\s\u00A0\u2028\u2029]+/g should fix this for all browsers, as it adds firefox`s additions to \s.

  5. ERROR in function 10 and 11:

    for (var i = str.length – 1; i > 0; i–) {
    => for (var i = str.length – 1; i >= 0; i–) {

  6. @Tiziano, that is not an error. The way they are written, there is no reason for the backwards loops to be concerned about the character at index 0.

  7. It seems to me that your loop does far more work than necessary. Why shorten the string one character at a time? Keep the loop counter around and you can do all the shortening at once. That’s bound to be faster.

    var i = str.length - 1;
    while ( i >= 0 && /\s/.test(str.charAt(i)) ) --i;
    str = str.substring(0, i + 1);

    Also, it might be worth pulling the regex construction out of the loop; that depends on how well Javascript compilers optimise.

    var ws = /\s/;
    while ( i >= 0 && ws.test(str.charAt(i)) ) --i;

  8. there is no reason for the backwards loops to be concerned about the character at index 0.

    True, but anyone who ever modifies the code needs to be aware that there’s an inactive bug in the code, lest they accidentally activate it. And it doesn’t cost anything at all to make the check correct. So there’s no reason not to fix it.

  9. @Aristotle Pagaltzis, putting the regex outside the loop shouldn’t matter according to ECMA-262 3rd Edition since the spec states that regex literals cause only one object to be created at runtime for a script or function. However, most implementations don’t respect that (Firefox does), and in any case the behavior is proposed to be changed in ECMAScript 4.

    Regarding changing the loop counter to allow an extra iteration… I’ve realized that Tiziano was correct. It was in fact an error in the case where there is whitespace to the right and only one non-whitespace character. I’ve fixed it, but the trim10/11 implementations are ugly anyway, as you’ve pointed out. Although I’ve intentionally been avoiding this for some time, I’ve gone ahead and edited the post to show a cleaner version of the trim11 approach at the end (which is nearly identical to what Scott Trenda posted earlier).

  10. @Yves:

    You can’t use lastIndexOf for this problem. That method only gives you a way to ask for the last appearance of a specific character, but what we need is a way to ask for the last appearance of any other character than a space. Additionally, \s in a regex doesn’t find just space characters, but a number of other whitespace characters as well. You can’t do that with lastIndexOf at all.

    @Steve:

    Now that I think of it, how does the following version fare?

    return str.replace(/^\s+/, '').replace(/.*\s+$/, '');

    This should be faster than any of the regex approaches you showed above. A class like [\s\S] is kinda silly – “match anything that’s whitespace or is not whitespace” is just a long-winded way to say “match anything,” except that non-IE browsers should be able to optimise it as well, and in fact many regex engines have special optimisations for .* built in. This should gobble up the entire string immediately and then do the same backtrack-loop as the explicit Javascript code does, except without crossing back and forth between the JS VM and the RE engine at every backtracking step in order to involve the JS VM dispatcher.

    But that’s just theory-based hypothesis – benchmarking is in order to confirm (or disprove) it.

  11. Oh! D’uh. Disregard the above suggestion. That won’t work for obvious reasons.

    I got confused because I do this in conjunction with \zs in Vim all the time. In Vim you could write s/.*\zs\s\s*$// and it would replace just the part after the \zs. In Perl 5.10 you can do the same using the \K escape. But Javascript has neither extension, so… yeah.

  12. \K would be very nice to have, especially since JavaScript has no lookbehind. But yeah, you can’t do that. Note that something like [\S\s] is necessary because JavaScript has no “single-line” (dot matches all) mode.

  13. OK, I think I’ve unbrainfarted myself enough to actually try my idea of using a greedy match and RE engine backtracking. Sorry for all the noise. Here’s trim12:

    function trim12 (str) {
        var str = str.replace(/^\s\s*/, ''),
        len = str.length;
     
        if (len && /\s/.test(str.charAt(len-1)) {
            var re = /.*\S/g;
            re.test(str);
            str = str.slice(0, re.lastIndex);
        }
     
        return str;
    }

    The trick here is as follows. The inner regex is run only if the string is non-empty, which means there must be non-whitespace characters in it, because otherwise the first substitution would have left an empty string, and only if the last character is whitespace. In that case, we run a global match that first gobbles up the entire string using .*, then backtracks until it can match \S. We know it must match because at this point we know the string ends with whitespace and we know it has non-whitespace characters in it. After the match, because it is global (/g flag), the position of the character after the end of the match will be recorded in the lastIndex property of the regex object.

    So we just use that to return the portion of the string before it.

    Please benchmark this. I’ve tested it and I know it works; now the question is how fast it is.

  14. JavaScript has no “single-line” (dot matches all) mode.

    Argh, now I see that. I guess explicitness would demand [-\uFFFF], but that’s clearly more cumbersome to type and read than [\s\S]. Sigh.

    (Hopefully I will stop spamming your comments now. Sorry again.)

Leave a Reply

Your email address will not be published. Required fields are marked *