Multiple String Replacement Sugar

How many times have you needed to run multiple replacement operations on the same string? It's not too bad, but can get a bit tedious if you write code like this a lot.

str = str.
	replace( /&(?!#?\w+;)/g , '&'    ).
	replace( /"([^"]*)"/g   , '“$1”'     ).
	replace( /</g           , '&lt;'     ).
	replace( />/g           , '&gt;'     ).
	replace( /…/g           , '&hellip;' ).
	replace( /“/g           , '&ldquo;'  ).
	replace( /”/g           , '&rdquo;'  ).
	replace( /‘/g           , '&lsquo;'  ).
	replace( /’/g           , '&rsquo;'  ).
	replace( /—/g           , '&mdash;'  ).
	replace( /–/g           , '&ndash;'  );

A common trick to shorten such code is to look up replacement values using an object as a hash table. Here's a simple implementation of this.

var hash = {
	'<' : '&lt;'    ,
	'>' : '&gt;'    ,
	'…' : '&hellip;',
	'“' : '&ldquo;' ,
	'”' : '&rdquo;' ,
	'‘' : '&lsquo;' ,
	'’' : '&rsquo;' ,
	'—' : '&mdash;' ,
	'–' : '&ndash;'
};

str = str.
	replace( /&(?!#?\w+;)/g , '&amp;' ).
	replace( /"([^"]*)"/g   , '“$1”'  ).
	replace( /[<>…“”‘’—–]/g , function ( $0 ) {
		return hash[ $0 ];
	});

However, this approach has some limitations.

  • Search patterns are repeated in the hash table and the regular expression character class.
  • Both the search and replacement are limited to plain text. That's why the first and second replacements had to remain separate in the above code. The first replacement used a regex search pattern, and the second used a backreference in the replacement text.
  • Replacements don't cascade. This is another reason why the second replacement operation had to remain separate. I want text like "this" to first be replaced with “this”, and eventually end up as &ldquo;this&rdquo;.
  • It doesn't work in Safari 2.x and other old browsers that don't support using functions to generate replacement text.

With a few lines of String.prototype sugar, you can deal with all of these issues.

String.prototype.multiReplace = function ( hash ) {
	var str = this, key;
	for ( key in hash ) {
		str = str.replace( new RegExp( key, 'g' ), hash[ key ] );
	}
	return str;
};

Now you can use code like this:

str = str.multiReplace({
	'&(?!#?\\w+;)' : '&amp;'   ,
	'"([^"]*)"'    : '“$1”'    ,
	'<'            : '&lt;'    ,
	'>'            : '&gt;'    ,
	'…'            : '&hellip;',
	'“'            : '&ldquo;' ,
	'”'            : '&rdquo;' ,
	'‘'            : '&lsquo;' ,
	'’'            : '&rsquo;' ,
	'—'            : '&mdash;' ,
	'–'            : '&ndash;'
});

If you care about the order of replacements, you should be aware that the current JavaScript specification does not require a particular enumeration order when looping over object properties with for..in. However, recent versions of the big four browsers (IE, Firefox, Safari, Opera) all use insertion order, which allows this to work as described (from top to bottom). ECMAScript 4 proposals indicate that the insertion-order convention will be formally codified in that standard.

If you need to worry about rogue properties that show up when people mess with Object.prototype, you can update the code as follows:

String.prototype.multiReplace = function ( hash ) {
	var str = this, key;
	for ( key in hash ) {
		if ( Object.prototype.hasOwnProperty.call( hash, key ) ) {
			str = str.replace( new RegExp( key, 'g' ), hash[ key ] );
		}
	}
	return str;
};

Calling the hasOwnProperty method on Object.prototype rather than on the hash object directly allows this method to work even when you're searching for the string "hasOwnProperty".

Lemme know if you think this is useful.

13 thoughts on “Multiple String Replacement Sugar”

  1. I just wondered if it would be possible to compile these data structures to one regular expression (and user OR pipes). To compile a function which uses this regular expression and automatically replaces all the items (maybe then using the map) should perform better as it omits the intermediate strings which a never used afterwards. What do you think?

  2. @Sebastian, this was inspired by a recent blog post by MetaDeveloper David Seruyange. What you described is the approach David used in his C# code (ported from some Python code by Xavier Defrang), so it was my first thought as well. Here’s a rough port of David’s code to JavaScript:

    function multiReplace (str, hash) {
    	var keys = [], key;
    	for (key in hash) {
    		keys.push(key);
    	}
    	return str.replace(new RegExp(keys.join('|'), 'g'), function ($0) {
    		return hash[$0];
    	});
    };
    

    However, that’s not as robust as the approach I described here, since it doesn’t support regex search patterns, backreferences in replacement text, or cascading replacements. Using regex metacharacters in the search text will in fact cause bad things to happen, unless you add special handling for that.

  3. @Dean Edwards, at my loss, I’ve never really checked out your Base2 source code or documentation. I’ll try to take a closer look at your RegGrp class later, since it seems quite interesting.

    A minor note about one of the examples in your RegGrp documentation:

    parser.add(/(['"])[^\1]*\1/, RegGrp.IGNORE); // ignore string values

    That doesn’t work as described, since for one thing, [^\1] matches any character that is not at octal index 1 in the character table. I’ve got an old post on matching quoted strings.

  4. If you want extra flexibility, why not use an array:-

    var replacements = [
        [/a/g, "b"],
        [/c/g, "d"],
    ]
    
    String.prototype.multiReplace = function ( replacements ) {
    	var str = this, i;
    	for (i = 0; i < replacements.length; i++ ) {
    		str = str.replace(replacements[i][0], replacements[i][1]);
    	}
    	return str;
    };

    This guarantees the order, saves having to generate the RegExp each time, and lets you use different switches.

  5. @Julian Turner, but that takes three extra characters per replacement. 😛

    Kidding aside, I agree that simple/flexible is better. I was mostly just experimenting and playing here, especially since the standard replacement syntax (shown at the beginning of this post) isn’t that bad to begin with.

  6. 3 extra characters – oh no, code bloat!

    No, I appreciate this is a bit of play.

    It tried experimenting with doing a “RegExp.prototype.test()” before the “replace()” to see if that gave any performance improvements, by saving unnecessary “str = str.replace”. No, it was worse!

    As operations on immutable things such a strings, I believe, can be very efficient, the str = str.replace, even for a lot of replacements, is very performant, so there does not seem to be any reason to optimise further, for the general case.

  7. This ought to be about the fastest, as it uses:
    – only one regex call, a very simple split
    – many non-regex string replace()
    – one join()

    —jim

    var html_ents = { '#160':' ', '#161':'¡','#162':'¢','#163':'£','#165':'¥',
    	'#167':'§','#169':'©','#171':'«','#174':'®','#177':'±','#180':'´',
    	'#181':'µ','#182':'¶','#183':'·','#187':'»','#191':'¿','#192':'À',
    	'#193':'Á','#194':'Â','#195':'Ã','#196':'Ä','#197':'Å','#198':'Æ',
    	'#199':'Ç','#200':'È','#201':'É','#202':'Ê','#203':'Ë','#204':'Ì',
    	'#205':'Í','#206':'Î','#207':'Ï','#209':'Ñ','#210':'Ò','#211':'Ó',
    	'#212':'Ô','#213':'Õ','#214':'Ö','#216':'Ø','#217':'Ù','#218':'Ú',
    	'#219':'Û','#220':'Ü','#223':'ß','#224':'à','#225':'á','#226':'â',
    	'#227':'ã','#228':'ä','#229':'å','#230':'æ','#231':'ç','#232':'è',
    	'#233':'é','#234':'ê','#235':'ë','#236':'ì','#237':'í','#238':'î',
    	'#239':'ï','#241':'ñ','#242':'ò','#243':'ó','#244':'ô','#245':'õ',
    	'#246':'ö','#247':'÷','#248':'ø','#249':'ù','#250':'ú','#251':'û',
    	'#252':'ü','#255':'ÿ','#34':'','#38':'&','#60':'<','#62':'>',
    	'#8211':'—','#8212':'–','#8364':'€','#96':'`',
    	'aacute':'Á','aacute':'á','acirc':'Â','acirc':'â','aelig':'Æ',
    	'aelig':'æ','agrave':'À','agrave':'à','amp':'&','aring':'Å',
    	'aring':'å','atilde':'Ã','atilde':'ã','auml':'Ä','auml':'ä',
    	'ccedil':'Ç','ccedil':'ç','cent':'¢','copy':'©','divide':'÷',
    	'eacute':'É','eacute':'é','ecirc':'Ê','ecirc':'ê','egrave':'È',
    	'egrave':'è','euml':'Ë','euml':'ë','euro':'€','gt':'>',
    	'iacute':'Í','iacute':'í','icirc':'Î','icirc':'î','iexcl':'¡',
    	'igrave':'Ì','igrave':'ì','iquest':'¿','iuml':'Ï','iuml':'ï',
    	'laquo':'«','lt':'<','mdash':'–','micro':'µ','middot':'·',
    	'nbsp':' ','ndash':'—','ntilde':'Ñ','ntilde':'ñ','oacute':'Ó',
    	'oacute':'ó','ocirc':'Ô','ocirc':'ô','ograve':'Ò','ograve':'ò',
    	'oslash':'Ø','oslash':'ø','otilde':'Õ','otilde':'õ','ouml':'Ö',
    	'ouml':'ö','para':'¶','plusmn':'±','pound':'£','quot':'\&quot;',
    	'raquo':'»','reg':'®','sect':'§','szlig':'ß','uacute':'Ú',
    	'uacute':'ú','ucirc':'Û','ucirc':'û','ugrave':'Ù','ugrave':'ù',
    	'uuml':'Ü','uuml':'ü','yen':'¥','yuml':'ÿ'};
    
    function html_entity_decode(str) {
    
    	parts = str.split(/&|;/g);
    
    	for(i=1, stop=parts.length; i<stop; i+=2) {
    		parts[i] = html_ents[parts[i]];
    	}
    
    	return parts.join('');
    }
    
    function init() {
    
    	document.writeln('some &lt;b&gt;bold&lt;/b&gt; text');
    	document.writeln('<br><br>
    
    ');
    	document.writeln('some &lt;b&gt;bold&lt;/b&gt; text'.split(/&|;/));
    	document.writeln('<br><br>
    ');
    	document.writeln(html_entity_decode('some &lt;b&gt;bold&lt;/b&gt; text'));
    	document.writeln('<br><br>
    
    ');
    	document.writeln(html_entity_decode('&lt;b&gt;bold&lt;/b&gt; text'));
    }
    
    init();
  8. Here is a version of multiReplace that will not overwrite a change if the change creates a new match later in the chain. The other multiReplace function’s results are dependent on the order the replacements are listed in the hash array.

    Also included is an example that will do php-like urlEncode on a string.

    function mreplace(str,hash){
    var a = [];
    for(var key in hash){a[a.length] = key;}
    var regexp = a.join(‘|’);
    return str.replace( new RegExp(regexp,’g’), function(m,l){ return hash[m] || hash[“\\”+m]; });
    }

    function urlEncode(a){
    return mreplace(escape(a),{‘\\/’:’%2F’,’\\?’:’%3F’,’=’:’%3D’,’&’:’%26′,’@’:’%40′,’\\s’:’+’});
    }

  9. return text.replace(/\\/g, ‘\\\\’).replace(/\(/g, ‘\\(‘).replace(/\)/g, ‘\\)’);

    it replace all \n from the text
    but i want to prevent them than what to do with this code?

  10. Thanks for sharing this Steven. I am using a slightly modified concoction out of your sample, and the one posted by @Julian Turner.

    if(!Object.prototype.hasOwnProperty.call(String, ‘replaceAll’)) { // Check if this isn’t already defined
    String.prototype[“replaceAll”] = function ( replacements ) {
    var str = this,
    replacements= [].concat(replacements); // Ensure we will always have an array
    for (var i = 0; i < replacements.length; i++ ) str = str.replace(replacements[i][0], replacements[i][1]);

    return str;
    };
    }

    I have included few important checks-

    1. Ensuring that replaceAll is not being overwritten if already declared on String.prototype
    2. Ensuring that replaceAll receives an Array

    Hope this helps someone. Thanks alot!

  11. if(!Object.prototype.hasOwnProperty.call(String, 'replaceAll')) { // Check if this isn't already defined
    String.prototype["replaceAll"] = function ( replacements ) {
    var str = this,
    replacements= [].concat(replacements); // Ensure we will always have an array
    for (var i = 0; i < replacements.length; i++ ) str = str.replace(replacements[i][0], replacements[i][1]);

    return str;
    };
    }

Leave a Reply

Your email address will not be published. Required fields are marked *