September 2007 — Flagrant Badassery

This post isn't about the pros and cons of innerHTML vs. W3C DOM methods. That has been hashed and rehashed elsewhere. Instead, I'll show how you can combine the use of innerHTML and DOM methods to make your code potentially hundreds of times faster than innerHTML on its own, when working with large numbers of elements.

In some browsers (most notably, Firefox), although innerHTML is generally much faster than DOM methods, it spends a disproportionate amount of time clearing out existing elements vs. creating new ones. Knowing this, we can combine the speed of destroying elements by removing their parent using the standard DOM methods with creating new elements using innerHTML. (This technique is something I discovered during the development of RegexPal, and is one of its two main performance optimizations. The other is one-shot markup generation for match highlighting, which avoids needing to loop over matches or reference them individually.)

The code:

function replaceHtml(el, html) {
	var oldEl = typeof el === "string" ? document.getElementById(el) : el;
	/*@cc_on // Pure innerHTML is slightly faster in IE
		oldEl.innerHTML = html;
		return oldEl;
	@*/
	var newEl = oldEl.cloneNode(false);
	newEl.innerHTML = html;
	oldEl.parentNode.replaceChild(newEl, oldEl);
	/* Since we just removed the old element from the DOM, return a reference
	to the new element, which can be used to restore variable references. */
	return newEl;
};

You can use the above as el = replaceHtml(el, newHtml) instead of el.innerHTML = newHtml.

innerHTML is already pretty fast...is this really warranted?

That depends on how many elements you're overwriting. In RegexPal, every keydown event potentially triggers the destruction and creation of thousands of elements (in order to make the syntax and match highlighting work). In such cases, the above approach has enormous positive impact. Even something as simple as el.innerHTML += str or el.innerHTML = "" could be a performance disaster if the element you're updating happens to have a few thousand children.

I've created a page which allows you to easily test the performance difference of innerHTML and my replaceHtml function with various numbers of elements. Make sure to try it out in a few browsers for comparison. Following are a couple examples of typical results from Firefox 2.0.0.6 on my system:

1000 elements...
innerHTML (destroy only): 156ms
innerHTML (create only): 15ms
innerHTML (destroy & create): 172ms
replaceHtml (destroy only): 0ms (faster)
replaceHtml (create only): 15ms (~ same speed)
replaceHtml (destroy & create): 15ms (11.5x faster)

15000 elements...
innerHTML (destroy only): 14703ms
innerHTML (create only): 250ms
innerHTML (destroy & create): 14922ms
replaceHtml (destroy only): 31ms (474.3x faster)
replaceHtml (create only): 250ms (~ same speed)
replaceHtml (destroy & create): 297ms (50.2x faster)

I think the numbers speak for themselves. Comparable performance improvements can also be seen in Safari. In Opera, replaceHtml is still typically faster than innerHTML, but by a narrower margin. In IE, simple use of innerHTML is typically faster than mixing it with DOM methods, but not by nearly the same kinds of margins as you can see above. Nevertheless, IE's conditional compilation feature is used to avoid the relatively minor performance penalty, by just using innerHTML with that browser.

Has it ever been done before? I'm interested in building an app (although I will possibly never get to it) which can automatically generate relatively-basic regular expressions by simply having the user enter some test data and mark the parts which the regex should match (all other text would serve as examples of what the regex should not match). By "relatively basic," I mean that the regexes should at least be capable of using literal characters, character classes, shorthand character classes like \d and \s, greedy quantification, alternation, grouping, and word boundaries.

Of course, there are major issues involved with this. For example, where would the algorithm draw the line between exactness and flexibility? I would imagine that it would attempt to create regexes which are as flexible as possible, since it's easy to rule out many non-matches by including a few non-matching examples which are very similar to strings which should match. Secondly, what would be the regex operator precedence (e.g., would it choose to attempt quantification before alternation)? There are numerous other implementation details which might present challenges, as well.

One thing that could sometimes improve the quality of the generated regexes is that the generator can cheat, by filtering expected results though a library of common patterns when this can be done with a reasonable level of certainty.

Have you ever considered this kind of thing in the past or seen it done elsewhere? Any thoughts on the issues or how the implementation might work? I have some ideas about simple algorithm approaches, but haven't proved out that they would work well in a respectable number of cases. Can you see something like this being useful to you?

Month: September 2007

When innerHTML isn’t Fast Enough

The code:

innerHTML is already pretty fast...is this really warranted?

Automatic Regex Generation