Flagrant Badassery

A JavaScript and regular expression centric blog

RSS Feed for UnicodeUnicode

Unicode Plugin for XRegExp

I've released a simple plugin for XRegExp (my JavaScript regex library) that adds support for Unicode properties and blocks to JavaScript regular expressions. It uses the Unicode 5.1 character database, which is the very latest version. The Unicode plugin enables the following Unicode properties/categories in any XRegExp: \p{L} — Letter \p{M} — Mark \p{N} — Number \p{P} — Punctuation \p{S} — [...]

Read More

JavaScript, Regex, and Unicode

Not all shorthand character classes and other JavaScript regex syntax is Unicode-aware. In some cases it can be important to know exactly what certain tokens match, and that's what this post will explore. According to ECMA-262 3rd Edition, \s, \S, ., ^, and $ use Unicode-based interpretations of whitespace and newline, while \d, \D, \w, \W, [...]

Read More