JavaScript — Flagrant Badassery

JavaScript Roman Numeral Converter

While looking for something quick to do during a brief internet outage, I wrote some code to convert to and from Roman numerals. Once things were back up I searched for equivalent code, but only found stuff that was multiple pages long, limited the range of what it could convert, or both. I figured I might as well share what I came up with:

function romanize (num) {
  if (!+num) return false;
  var digits = String(+num).split('');
  var key = ['','C','CC','CCC','CD','D','DC','DCC','DCCC','CM',
             '','X','XX','XXX','XL','L','LX','LXX','LXXX','XC',
             '','I','II','III','IV','V','VI','VII','VIII','IX'];
  var roman = '', i = 3;
  while (i--) roman = (key[+digits.pop() + (i * 10)] || '') + roman;
  return Array(+digits.join('') + 1).join('M') + roman;
}

function deromanize (str) {
  var str = str.toUpperCase();
  var validator = /^M*(?:D?C{0,3}|C[MD])(?:L?X{0,3}|X[CL])(?:V?I{0,3}|I[XV])$/;
  var token = /[MDLV]|C[MD]?|X[CL]?|I[XV]?/g;
  var key = {M:1000,CM:900,D:500,CD:400,C:100,XC:90,L:50,XL:40,X:10,IX:9,V:5,IV:4,I:1};
  var num = 0, m;
  if (!(str && validator.test(str))) return false;
  while (m = token.exec(str)) num += key[m[0]];
  return num;
}

How would you rewrite this code? Can you create a shorter version?

Regular Expressions As Functions

Firefox includes a non-standard JavaScript extension that makes regular expressions callable as functions. This serves as a shorthand for calling a regex's exec method. For example, in Firefox /regex/("string") is equivalent to /regex/.exec("string"). Early ECMAScript 4 proposals indicated this functionality would be added to the ES4 specification, but subsequent discussion on the ES4-discuss mailing list suggests it might be dropped.

However, you can implement something similar by adding call and apply methods to RegExp.prototype, which could help with functional programming and duck-typed code that works with both functions and regular expressions. So let's add them:

RegExp.prototype.call = function (context, str) {
	return this.exec(str);
};
RegExp.prototype.apply = function (context, args) {
	return this.exec(args[0]);
};

Note that both of the above methods completely ignore the context argument. You could pass in null or whatever else as the context, and you'd get back the normal result of running exec on the regex. Using the above methods, you can generically work with both regular expressions and functions wherever it's convenient to do so. A few obvious cases where this could be helpful are the JavaScript 1.6 array iteration methods. Following are implementations of filter, every, some, and map that allow them to be used cross-browser:

// Returns an array with the elements of an existng array for which the provided filtering function returns true
Array.prototype.filter = function (func, context) {
	var results = [];
	for (var i = 0; i < this.length; i++) {
		if (i in this && func.call(context, this[i], i, this))
			results.push(this[i]);
	}
	return results;
};
// Returns true if every element in the array satisfies the provided testing function
Array.prototype.every = function (func, context) {
	for (var i = 0; i < this.length; i++) {
		if (i in this && !func.call(context, this[i], i, this))
			return false;
	}
	return true;
};
// Returns true if at least one element in the array satisfies the provided testing function
Array.prototype.some = function (func, context) {
	for (var i = 0; i < this.length; i++) {
		if (i in this && func.call(context, this[i], i, this))
			return true;
	}
	return false;
};
// Returns an array with the results of calling the provided function on every element in the provided array
Array.prototype.map = function (func, context) {
	var results = [];
	for (var i = 0; i < this.length; i++) {
		if (i in this)
			results[i] = func.call(context, this[i], i, this);
	}
	return results;
};

Because the array and null values returned by exec type-convert nicely to true and false, the above code allows you to use something like ["a","b","ab","ba"].filter(/^a/) to return all values that start with "a": ["a","ab"]. The code ["1",1,0,"a",3.1,256].filter(/^[1-9]\d*$/) would return integers greater than zero, regardless of type: ["1",1,256]. str.match(/a?b/g).filter(/^b/) would return all matches of "b" not preceded by "a". This can be a convenient pattern since JavaScript doesn't support lookbehind.

All of the above examples already work with Firefox's native Array.prototype.filter because of the indirect exec calling feature in that browser, but they wouldn't work with the cross-browser implementation of filter above without adding RegExp.prototype.call.

Does this seem like something that would be useful to you? Can you think of other good examples where call and apply methods would be useful for regular expressions?

Update: This post has been translated into Chinese by PlanABC.net.

Timed Memoization

Certain operations are computationally expensive, but because their results might change over time or due to outside influences, they don't lend themselves to typical memoization — take for example getElementsByClassName. Here's a JavaScript timed memoization decorator / higher-order-function I made to help with these cases, which accepts an optional expiration argument in milliseconds.

function memoize (functor, expiration) {
	var memo = {};
	return function () {
		var key = Array.prototype.join.call(arguments, "§");
		if (key in memo)
			return memo[key];
		if (expiration)
			setTimeout(function () {delete memo[key];}, expiration);
		return memo[key] = functor.apply(this, arguments);
	};
}

This approach allows you to turn any function into a memoizing function. Note that return values are memoized for each set of arguments. However, due to technical constraints it's only reliable when the arguments are arrays or scalar values, but you could easily use e.g. a toJSON method rather than join to serialize objects as part of the cache key (at some additional overhead cost).

You can use the above code like this:

// Make a function which memoizes for 1000 milliseconds at a time
var fn = memoize(function () {
	Array(500000).join("."); // slow
	return true;
}, 1000);

…Or leave out the expiration argument to permanently memoize.

Here are a couple more posts on JavaScript memoization:

One-Line JavaScript Memoization by Oliver Steele
Memoization in JavaScript by Keith Gaughan

JavaScript, Regex, and Unicode

Not all shorthand character classes and other JavaScript regex syntax is Unicode-aware. In some cases it can be important to know exactly what certain tokens match, and that's what this post will explore.

According to ECMA-262 3rd Edition, \s, \S, ., ^, and $ use Unicode-based interpretations of whitespace and newline, while \d, \D, \w, \W, \b, and \B use ASCII-only interpretations of digit, word character, and word boundary (e.g. /a\b/.test("naïve") returns true). Actual browser implementations often differ on these points. For example, Firefox 2 considers \d and \D to be Unicode-aware, while Firefox 3 fixes this bug — making \d equivalent to [0-9] as with most other browsers.

Here again are the affected tokens, along with their definitions:

\d — Digits.
\s — Whitespace.
\w — Word characters.
\D — All except digits.
\S — All except whitespace.
\W — All except word characters.
. — All except newlines.
^ (with /m) — The positions at the beginning of the string and just after newlines.
$ (with /m) — The positions at the end of the string and just before newlines.
\b — Word boundary positions.
\B — Not word boundary positions.

All of the above are standard in Perl-derivative regex flavors. However, the meaning of the terms digit, whitespace, word character, word boundary, and newline depend on the regex flavor, character set, and platform you're using, so here are the official JavaScript meanings as they apply to regexes:

Digit — The characters 0-9 only.
Whitespace — Tab, line feed, vertical tab, form feed, carriage return, space, no-break space, line separator, paragraph separator, and "any other Unicode 'space separator'".
Word character — The characters A-Z, a-z, 0-9, and _ only.
Word boundary — The position between a word character and non-word character.
Newline — The line feed, carriage return, line separator, and paragraph separator characters.

Here again are the newline characters, with their character codes:

\u000a — Line feed — \n
\u000d — Carriage return — \r
\u2028 — Line separator
\u2029 — Paragraph separator

Note that ECMAScript 4 proposals indicate that the C1/Unicode NEL "next line" control character (\u0085) will be recognized as an additional newline character in that standard. Also note that although CRLF (a carriage return followed by a line feed) is treated as a single newline sequence in most contexts, /\r^$\n/m.test("\r\n") returns true.

As for whitespace, ECMA-262 3rd Edition uses an interpretation based on Unicode's Basic Multilingual Plane, from version 2.1 or later of the Unicode standard. Following are the characters which should be matched by \s according to ECMA-262 3rd Edition and Unicode 5.1:

\u0009 — Tab — \t
\u000a — Line feed — \n — (newline character)
\u000b — Vertical tab — \v
\u000c — Form feed — \f
\u000d — Carriage return — \r — (newline character)
\u0020 — Space
\u00a0 — No-break space
\u1680 — Ogham space mark
\u180e — Mongolian vowel separator
\u2000 — En quad
\u2001 — Em quad
\u2002 — En space
\u2003 — Em space
\u2004 — Three-per-em space
\u2005 — Four-per-em space
\u2006 — Six-per-em space
\u2007 — Figure space
\u2008 — Punctuation space
\u2009 — Thin space
\u200a — Hair space
\u2028 — Line separator — (newline character)
\u2029 — Paragraph separator — (newline character)
\u202f — Narrow no-break space
\u205f — Medium mathematical space
\u3000 — Ideographic space

To test which characters or positions are matched by all of the tokens mentioned here in your browser, see JavaScript Regex and Unicode Tests. Note that Firefox 2.0.0.11, IE 7, and Safari 3.0.3 beta all get some of the tests wrong.

Update: My new Unicode plugin for XRegExp allows you to easily match Unicode categories, scripts, and blocks in JavaScript regular expressions.

JavaScript Password Validation

Since I've seen tons of password validation help requests on regexadvice.com (where I hang out from time to time), I've written up a more general-purpose JavaScript password validation function. It's reasonably straightforward, and covers the validation requirements I've most frequently encountered. Plus, if it doesn't handle your exact needs, its functionality can be augmented by passing it custom functions and regular expressions.

Here are the validation types supported out of the box. All are optional, which means that all passwords are allowed by default.

Minimum and maximum length.
Minimum n lowercase characters (a–z).
Minimum n uppercase characters (A–Z).
Minimum n combined a–z and A–Z characters.
Minimum n numeric characters (0–9).
Minimum n special characters (characters other than a–z, A–Z, and 0–9).
Ban particular words (tested case-insensitively).
Ban n-length character sequences (e.g. "abc", "XYZ", or "789", with a sequence length of 3; does not apply to special characters).
Ban n-length qwerty character sequences (e.g. "qwerty" or "asdf", with a sequence length of 4; does not apply to special characters).
Ban sequential, identical characters (e.g. "aa" or "!!").
Use custom regular expressions (tested using RegExp.prototype.test) and functions (the password is provided as the first argument, and a Boolean value is expected in return).

Here's an example of how it can be used:

var password = "password";
var passed = validatePassword(password, {
	length:   [8, Infinity],
	lower:    1,
	upper:    1,
	numeric:  1,
	special:  1,
	badWords: ["password", "steven", "levithan"],
	badSequenceLength: 4
});
// passed: false

The above requires that password is at least eight characters long; has at least one lowercase, uppercase, numeric, and special character; doesn't include the words "password", "steven", or "levithan"; and doesn't include an alphanumeric sequence four or more characters in length (e.g. "1234").

Here's the code (there are no external library dependencies):

/*
	Password Validator 0.1
	(c) 2007 Steven Levithan <stevenlevithan.com>
	MIT License
*/

function validatePassword (pw, options) {
	// default options (allows any password)
	var o = {
		lower:    0,
		upper:    0,
		alpha:    0, /* lower + upper */
		numeric:  0,
		special:  0,
		length:   [0, Infinity],
		custom:   [ /* regexes and/or functions */ ],
		badWords: [],
		badSequenceLength: 0,
		noQwertySequences: false,
		noSequential:      false
	};

	for (var property in options)
		o[property] = options[property];

	var	re = {
			lower:   /[a-z]/g,
			upper:   /[A-Z]/g,
			alpha:   /[A-Z]/gi,
			numeric: /[0-9]/g,
			special: /[\W_]/g
		},
		rule, i;

	// enforce min/max length
	if (pw.length < o.length[0] || pw.length > o.length[1])
		return false;

	// enforce lower/upper/alpha/numeric/special rules
	for (rule in re) {
		if ((pw.match(re[rule]) || []).length < o[rule])
			return false;
	}

	// enforce word ban (case insensitive)
	for (i = 0; i < o.badWords.length; i++) {
		if (pw.toLowerCase().indexOf(o.badWords[i].toLowerCase()) > -1)
			return false;
	}

	// enforce the no sequential, identical characters rule
	if (o.noSequential && /([\S\s])\1/.test(pw))
		return false;

	// enforce alphanumeric/qwerty sequence ban rules
	if (o.badSequenceLength) {
		var	lower   = "abcdefghijklmnopqrstuvwxyz",
			upper   = lower.toUpperCase(),
			numbers = "0123456789",
			qwerty  = "qwertyuiopasdfghjklzxcvbnm",
			start   = o.badSequenceLength - 1,
			seq     = "_" + pw.slice(0, start);
		for (i = start; i < pw.length; i++) {
			seq = seq.slice(1) + pw.charAt(i);
			if (
				lower.indexOf(seq)   > -1 ||
				upper.indexOf(seq)   > -1 ||
				numbers.indexOf(seq) > -1 ||
				(o.noQwertySequences && qwerty.indexOf(seq) > -1)
			) {
				return false;
			}
		}
	}

	// enforce custom regex/function rules
	for (i = 0; i < o.custom.length; i++) {
		rule = o.custom[i];
		if (rule instanceof RegExp) {
			if (!rule.test(pw))
				return false;
		} else if (rule instanceof Function) {
			if (!rule(pw))
				return false;
		}
	}

	// great success!
	return true;
}

You can download it here.

Lemme know if you have any feature requests or other suggestions about how to improve it, or if you need help writing custom rules for it.