<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="wordpress/2.3.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>Flagrant Badassery &#187; JavaScript</title>
	<link>http://blog.stevenlevithan.com</link>
	<description>A JavaScript and regular expression centric blog</description>
	<pubDate>Sat, 17 May 2008 03:13:18 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.3.3</generator>
	<language>en</language>
			<item>
		<title>Test Your XRegExps with JRX</title>
		<link>http://blog.stevenlevithan.com/archives/jrx-xregexp</link>
		<comments>http://blog.stevenlevithan.com/archives/jrx-xregexp#comments</comments>
		<pubDate>Fri, 16 May 2008 09:38:13 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
		
		<category><![CDATA[JavaScript]]></category>

		<category><![CDATA[Regular Expressions]]></category>

		<category><![CDATA[xregexp]]></category>

		<guid isPermaLink="false">http://blog.stevenlevithan.com/archives/jrx-xregexp</guid>
		<description><![CDATA[Cüneyt Yılmaz's JRX is a cool JavaScript regex tester inspired by the RX tool of Komodo IDE. Cüneyt recently added my XRegExp library to his tester, so JRX is now a nice and easy way to test XRegExp's singleline and extended modes, as well as named capture and other XRegExp-provided syntax. Check it out!

As for [...]]]></description>
			<content:encoded><![CDATA[<p>Cüneyt Yılmaz's <strong><a href="http://cuneytyilmaz.com/prog/jrx/">JRX</a></strong> is a cool JavaScript regex tester inspired by the RX tool of <a href="http://www.activestate.com/Products/komodo_ide/">Komodo IDE</a>. Cüneyt recently added my <a href="http://stevenlevithan.com/regex/xregexp/">XRegExp library</a> to his tester, so JRX is now a nice and easy way to test XRegExp's singleline and extended modes, as well as named capture and other XRegExp-provided syntax. <a href="http://cuneytyilmaz.com/prog/jrx/">Check it out!</a></p>

<p>As for XRegExp, it has recently been upgraded to v0.5.2, which resolved a corner-case bug involving <code>XRegExp.matchRecursive</code>. See the <a href="http://stevenlevithan.com/regex/xregexp/history.html">changelog</a> for details.</p>

<p>I'll take this opportunity to highlight some of my other favorite online regex testers. I've actively looked for these kinds of apps over the years and have probably seen close to a hundred of them. Odds are you'll find something new here.</p>

<ul>
	<li><a href="http://regexpal.com">RegexPal</a> &mdash; My own JavaScript regex tester. It includes real-time regex syntax and match highlighting. Although RegexPal uses XRegExp to provide the singleline option, unlike JRX it uses JavaScript regex syntax without the XRegExp syntax extensions.</li>
	<li><a href="http://regex.larsolavtorvik.com">regex</a> &mdash; Simple name, simple interface. Great set of flavor support. JavaScript, Perl, Python, PHP/preg (PCRE), PHP/ereg (POSIX ERE).</li>
	<li><a href="http://osteele.com/tools/rework/">reWork</a> &mdash; Feature-rich JavaScript regex workbench.</li>
	<li><a href="http://osteele.com/tools/reanimator/">reAnimator</a> &mdash; Fun app for visualizing regex FSAs, by the same author as reWork.</li>
	<li><a href="http://regexmate.com">RegexMate</a> &mdash; JavaScript regex console.</li>
	<li><a href="http://www.contentbox.com/claude/REwizard/">The REWizard</a> &mdash; IE only, but offers regex building tools and an interesting visualization.</li>
	<li><a href="http://www.myregextester.com">MyRegexTester</a> &mdash; Includes code generation and plain-text explanations (via the YAPE::Regex::Explain Perl module).</li>
	<li><a href="http://regexp.resource.googlepages.com/analyzer.html">Regular Expression Analyzer</a> &mdash; Real-time regex explanation tree that emulates Java, JavaScript, and Perl flavors. Its <a href="http://regexp.resource.googlepages.com/RegEx3.js">regex parsing code</a> is very readable.</li>
	<li><a href="http://www.nregex.com">Nregex</a> &mdash; .NET regex tester.</li>
	<li><a href="http://www.rubular.com">Rubular</a> &mdash; Ruby regex tester.</li>
	<li><a href="http://www.fileformat.info/tool/regex.htm">FileFormat regex tester</a> &mdash; Java regex tester.</li>
</ul>

<p>Have fun!</p>]]></content:encoded>
			<wfw:commentRss>http://blog.stevenlevithan.com/archives/jrx-xregexp/feed</wfw:commentRss>
		</item>
		<item>
		<title>XRegExp 0.5 Released!</title>
		<link>http://blog.stevenlevithan.com/archives/xregexp-0-5</link>
		<comments>http://blog.stevenlevithan.com/archives/xregexp-0-5#comments</comments>
		<pubDate>Mon, 21 Apr 2008 00:43:35 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
		
		<category><![CDATA[JavaScript]]></category>

		<category><![CDATA[Project Releases]]></category>

		<category><![CDATA[Regular Expressions]]></category>

		<category><![CDATA[recursion]]></category>

		<category><![CDATA[xregexp]]></category>

		<guid isPermaLink="false">http://blog.stevenlevithan.com/archives/xregexp-0-5</guid>
		<description><![CDATA[If you haven't seen the prior versions, XRegExp is an MIT-licensed JavaScript library that provides an augmented, cross-browser implementation of regular expressions, including support for additional modifiers and syntax. Several convenience methods and a new, powerful recursive-construct parser that uses regex delimiters are also included.

Here's what you get beyond the standard JavaScript regex features:


	Added regex [...]]]></description>
			<content:encoded><![CDATA[<p>If you haven't seen the prior versions, XRegExp is an MIT-licensed JavaScript library that provides an augmented, cross-browser implementation of regular expressions, including support for additional modifiers and syntax. Several convenience methods and a new, powerful recursive-construct parser that uses regex delimiters are also included.</p>

<p>Here's what you get beyond the standard JavaScript regex features:</p>

<ul>
	<li>Added regex syntax:
		<ul>
			<li>Comprehensive named capture support. <strong class="small">(Improved)</strong></li>
			<li>Comment patterns: <code>(?#…)</code>. <strong class="small">(New)</strong></li>
		</ul>
	</li>
	<li>Added regex modifiers (flags):
		<ul>
			<li><code>s</code> (<em>singleline</em>), to make dot match all characters including newlines.</li>
			<li><code>x</code> (<em>extended</em>), for free-spacing and comments.</li>

		</ul>
	</li>
	<li>Added awesome:
		<ul>
			<li>Reduced cross-browser inconsistencies. <strong class="small">(More)</strong></li>
			<li>Recursive-construct parser with regex delimiters. <strong class="small">(New)</strong></li>
			<li>An easy way to cache and reuse regex objects. <strong class="small">(New)</strong></li>
			<li>The ability to safely embed literal text in your regex patterns. <strong class="small">(New)</strong></li>
			<li>A method to add modifiers to existing regex objects.</li>
			<li>Regex <code>call</code> and <code>apply</code> methods, which make generically working with functions and regexes easier. <strong class="small">(New)</strong></li>
		</ul>
	</li>
</ul>

<p>All of this can be yours for the low, low price of 2.4 KB. <img src="http://blog.stevenlevithan.com/wp-includes/images/smilies/icon_smile.gif" alt="smile" /> Version 0.5 also introduces extensive documentation with lots of code examples. See: <a href="http://stevenlevithan.com/regex/xregexp/"><strong>XRegExp: JavaScript regular expression library</strong></a>.</p>

<p>If you're using a previous version, note that there are a few non-backward compatible changes for the sake of strict ECMA-262 Edition 3 compliance and compatibility with upcoming ECMAScript 4 changes.</p>

<ul>
	<li>The <code>XRegExp.overrideNative</code> function has been removed, since it is no longer possible to override native constructors in Firefox 3 or ECMAScript 4 (as proposed).</li>
	<li>Named capture syntax has been changed from <code>(&lt;name&gt;&hellip;)</code> to <code>(?&lt;name&gt;&hellip;)</code>, which is the standard in most regex libraries and under consideration for ES4. Named capture is now always available, and does not require the <code>k</code> modifier.</li>
	<li>Due to cross-browser compatibility issues, previous versions enforced that a leading, unescaped <code>]</code> within a character class was treated as a literal character, which is how things work in most regex flavors. XRegExp now follows ECMA-262 Edition 3 on this point. <code>[]</code> is an empty set and never matches (this is enforced in all browsers).</li>
</ul>

<p>Get it while it's hot! Check out the new <a href="http://stevenlevithan.com/regex/xregexp/">XRegExp documentation</a> and <a href="http://stevenlevithan.com/regex/xregexp/xregexp.js">source code</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://blog.stevenlevithan.com/archives/xregexp-0-5/feed</wfw:commentRss>
		</item>
		<item>
		<title>An IE lastIndex Bug with Zero-Length Regex Matches</title>
		<link>http://blog.stevenlevithan.com/archives/exec-bugs</link>
		<comments>http://blog.stevenlevithan.com/archives/exec-bugs#comments</comments>
		<pubDate>Mon, 14 Apr 2008 02:24:59 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
		
		<category><![CDATA[Cross-Browser Issues]]></category>

		<category><![CDATA[JavaScript]]></category>

		<category><![CDATA[Regular Expressions]]></category>

		<guid isPermaLink="false">http://blog.stevenlevithan.com/archives/exec-bugs</guid>
		<description><![CDATA[The bottom line of this blog post is that Internet Explorer incorrectly increments a regex object's lastIndex property after a successful, zero-length match. However, for anyone who isn't sure what I'm talking about or is interested in how to work around the problem, I'll describe the issue with examples of iterating over each match in [...]]]></description>
			<content:encoded><![CDATA[<p>The bottom line of this blog post is that Internet Explorer incorrectly increments a regex object's <code>lastIndex</code> property after a successful, zero-length match. However, for anyone who isn't sure what I'm talking about or is interested in how to work around the problem, I'll describe the issue with examples of iterating over each match in a string using the <code>RegExp.prototype.exec</code> method. That's where I've most frequently encountered the bug, and I think it will help explain why the issue exists in the first place.</p>

<p>First of all, if you're not already familiar with how to use <code>exec</code> to iterate over a string, you're missing out on some very powerful functionality. Here's the basic construct:</p>

<pre class="code">var	regex = /.../g,
	subject = "test",
	match = regex.exec(subject);

while (match != null) {
	<span class="comment">// matched text: match[0]
	// match start: match.index
	// match end: regex.lastIndex
	// capturing group n: match[n]</span>

	...

	match = regex.exec(subject);
}
</pre>

<p>When the <code>exec</code> method is called for a regex that uses the <code>/g</code> (global) modifier, it searches from the point in the subject string specified by the regex's <code>lastIndex</code> property (which is initially zero, so it searches from the beginning of the string). If the <code>exec</code> method finds a match, it updates the regex's <code>lastIndex</code> property to the character index at the end of the match, and returns an array containing the matched text and any captured subexpressions. If there is no match from the point in the string where the search started, <code>lastIndex</code> is reset to zero, and <code>null</code> is returned.</p>

<p>You can tighten up the above code by moving the <code>exec</code> method call into the <code>while</code> loop's condition, like so:</p>

<pre class="code">var	regex = /.../g,
	subject = "test",
	match;

while (match = regex.exec(subject)) {
	...
}
</pre>

<p>This cleaner version works essentially the same as before. As soon as <code>exec</code> can't find any further matches and therefore returns <code>null</code>, the loop ends. However, there are a couple cross-browser issues to be aware of with either version of this code. One is that if the regex contains capturing groups which do not participate in the match, some values in the returned array could be either <code>undefined</code> or an empty string. I've previously discussed that issue in depth in a post about what I called <a href="http://blog.stevenlevithan.com/archives/npcg-javascript">non-participating capturing groups</a>.</p>

<p>Another issue (the topic of <em>this</em> post) occurs when your regex matches an empty string. There are many reasons why you might allow a regex to do that, but if you can't think of any, consider cases where you're accepting regexes from an outside source. Here's a simple example of such a regex:</p>

<pre class="code">var	regex = /^/gm,
	subject = "A\nB\nC",
	match,
	endPositions = [];

while (match = regex.exec(subject)) {
	endPositions.push(regex.lastIndex);
}
</pre>

<p>You might expect the <code>endPositions</code> array to be set to <code>[0,2,4]</code>, since those are the character positions for the beginning of the string and just after each newline character. Thanks to the <code>/m</code> modifier, those are the positions where the regex will match; and since the regex matches empty strings, <code>regex.lastIndex</code> should be the same as <code>match.index</code>. However, Internet Explorer (tested with v5.5&ndash;7) sets <code>endPositions</code> to <code>[1,3,5]</code>. Other browsers will go into an infinite loop until you short-circuit the code.</p>

<p>So what's going on here? Remember that every time <code>exec</code> runs, it attempts to match within the subject string starting at the position specified by the <code>lastIndex</code> property of the regex. Since our regex matches a zero-length string, <code>lastIndex</code> remains exactly where we started the search. Therefore, every time through the loop our regex will match at the same position&mdash;the start of the string. Internet Explorer tries to be helpful and avoid this situation by automatically incrementing <code>lastIndex</code> when a zero-length string is matched. That might seem like a good idea (in fact, I've seen people adamantly argue that is a bug that Firefox does not do the same), but it means that in Internet Explorer the <code>lastIndex</code> property cannot be relied on to accurately determine the ending position of a match.</p>

<p>We can correct this situation cross-browser with the following code:</p>

<pre class="code">var	regex = /^/gm,
	subject = "A\nB\nC",
	match,
	endPositions = [];

while (match = regex.exec(subject)) {
	var zeroLengthMatch = !match[0].length;
	<span class="comment">// Fix IE's incorrect lastIndex</span>
	if (zeroLengthMatch &#038;& regex.lastIndex > match.index)
		regex.lastIndex--;

	endPositions.push(regex.lastIndex);

	<span class="comment">// Avoid an infinite loop with zero-length matches</span>
	if (zeroLengthMatch)
		regex.lastIndex++;
}
</pre>

<p>You can see an example of the above code in the <a href="http://blog.stevenlevithan.com/archives/cross-browser-split">cross-browser split method</a> I posted a while back. Keep in mind that none of the extra code here is needed if your regex cannot possibly match an empty string.</p>

<p>Another way to deal with this issue is to use <code>String.prototype.replace</code> to iterate over the subject string. The <code>replace</code> method moves forward automatically after zero-length matches, avoiding this issue altogether. Unfortunately, in the three biggest browsers (IE, Firefox, Safari), <code>replace</code> doesn't seem to deal with the <code>lastIndex</code> property except to reset it to zero. Opera gets it right (according to my reading of the spec) and updates <code>lastIndex</code> along the way. Given the current situation, you can't rely on <code>lastIndex</code> in your code when iterating over a string using <code>replace</code>, but you can still easily derive the value for the end of each match. Here's an example:</p>

<pre class="code">var	regex = /^/gm,
	subject = "A\nB\nC",
	endPositions = [];

subject.replace(regex, function (match) {
	<span class="comment">// Not using a named argument for the index since capturing
	// groups can change its position in the list of arguments</span>
	var	index = arguments[arguments.length - 2],
		lastIndex = index + match.length;

	endPositions.push(lastIndex);
});
</pre>

<p>That's perhaps less lucid than before (since we're not actually replacing anything), but there you have it&hellip; two cross-browser ways to get around a little-known issue that could otherwise cause tricky, latent bugs in your code.</p>]]></content:encoded>
			<wfw:commentRss>http://blog.stevenlevithan.com/archives/exec-bugs/feed</wfw:commentRss>
		</item>
		<item>
		<title>A JScript/VBScript Regex Lookahead Bug</title>
		<link>http://blog.stevenlevithan.com/archives/regex-lookahead-bug</link>
		<comments>http://blog.stevenlevithan.com/archives/regex-lookahead-bug#comments</comments>
		<pubDate>Mon, 24 Mar 2008 05:50:58 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
		
		<category><![CDATA[Cross-Browser Issues]]></category>

		<category><![CDATA[JavaScript]]></category>

		<category><![CDATA[Regular Expressions]]></category>

		<category><![CDATA[VBScript]]></category>

		<guid isPermaLink="false">http://blog.stevenlevithan.com/archives/regex-lookahead-bug</guid>
		<description><![CDATA[Here's one of the oddest and most significant regex bugs in Internet Explorer. It can appear when using optional elision within lookahead (e.g., via ?, *, {0,n}, or (.&#124;), but not +, interval quantifiers starting from one or higher, or alternation without a zero-length option). An example in JavaScript:

/(?=a?b)ab/.test("ab");
// Should return true, but IE 5.5 [...]]]></description>
			<content:encoded><![CDATA[<p>Here's one of the oddest and most significant regex bugs in Internet Explorer. It can appear when using optional elision within lookahead (e.g., via <code>?</code>, <code>*</code>, <code>{0,<em>n</em>}</code>, or <code>(.|)</code>, but not <code>+</code>, interval quantifiers starting from one or higher, or alternation without a zero-length option). An example in JavaScript:</p>

<pre class="code"><span class="regex">/(?=a?b)ab/</span>.test("ab");
<span class="comment">// Should return true, but IE 5.5 &ndash; 8b1 return false</span>

<span class="regex">/(?=a?b)ab/</span>.test("abc");
<span class="comment">// Correctly returns true (even in IE), although the
// added "c" does not take part in the match</span>
</pre>

<p>I've been aware of this bug for a couple years, thanks to a <a href="http://regexadvice.com/blogs/mash/archive/2004/10/05/320.aspx">blog post by Michael Ash</a> that describes the bug with a password-complexity regex. However, the bug description there is incomplete and subtly incorrect, as shown by the above, reduced test case. To be honest, although the errant behavior is predictable, it's a bit tricky to describe because I haven't yet figured out exactly what's happening internally. I'd recommend playing with variations of the above code to get a better understanding of the problem.</p>

<p>Fortunately, since the bug is predictable, it's usually possible to work around. For example, you can avoid the bug with the password regex in Michael's post (<code>/^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{8,15}$/</code>) by writing it as <code>/^(?=.{8,15}$)(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).*/</code> (the <code>.{8,15}$</code> lookahead must come first here). The important thing is to be aware of the issue, because it can easily introduce latent and difficult to diagnose bugs into your code. Just remember that it shows up with variable-length lookahead. If you're using such patterns, test the hell out of them in IE.</p>]]></content:encoded>
			<wfw:commentRss>http://blog.stevenlevithan.com/archives/regex-lookahead-bug/feed</wfw:commentRss>
		</item>
		<item>
		<title>JavaScript Roman Numeral Converter</title>
		<link>http://blog.stevenlevithan.com/archives/javascript-roman-numeral-converter</link>
		<comments>http://blog.stevenlevithan.com/archives/javascript-roman-numeral-converter#comments</comments>
		<pubDate>Sun, 16 Mar 2008 04:59:51 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
		
		<category><![CDATA[Code Challenge]]></category>

		<category><![CDATA[JavaScript]]></category>

		<guid isPermaLink="false">http://blog.stevenlevithan.com/archives/javascript-roman-numeral-converter</guid>
		<description><![CDATA[While looking for something quick to do during a brief internet outage, I wrote some code to convert to and from Roman numerals. Once things were back up I searched for equivalent code, but only found stuff that was multiple pages long, limited the range of what it could convert, or both. I figured I [...]]]></description>
			<content:encoded><![CDATA[<p>While looking for something quick to do during a brief internet outage, I wrote some code to convert to and from Roman numerals. Once things were back up I searched for equivalent code, but only found stuff that was multiple pages long, limited the range of what it could convert, or both. I figured I might as well share what I came up with:</p>

<pre class="code">function romanize (num) {
	if (!+num)
		return false;
	var	digits = String(+num).split(""),
		key = ["","C","CC","CCC","CD","D","DC","DCC","DCCC","CM",
		       "","X","XX","XXX","XL","L","LX","LXX","LXXX","XC",
		       "","I","II","III","IV","V","VI","VII","VIII","IX"],
		roman = "",
		i = 3;
	while (i--)
		roman = (key[+digits.pop() + (i * 10)] || "") + roman;
	return Array(+digits.join("") + 1).join("M") + roman;
}

function deromanize (str) {
	var	str = str.toUpperCase(),
		validator = <span class="regex">/^M*(?:D?C{0,3}|C[MD])(?:L?X{0,3}|X[CL])(?:V?I{0,3}|I[XV])$/</span>,
		token = <span class="regex">/[MDLV]|C[MD]?|X[CL]?|I[XV]?/g</span>,
		key = {M:1000,CM:900,D:500,CD:400,C:100,XC:90,L:50,XL:40,X:10,IX:9,V:5,IV:4,I:1},
		num = 0, m;
	if (!(str &amp;&amp; validator.test(str)))
		return false;
	while (m = token.exec(str))
		num += key[m[0]];
	return num;
}

</pre>

<p style="font-size:130%">How would <em>you</em> rewrite this code? Can you create a shorter version?</p>]]></content:encoded>
			<wfw:commentRss>http://blog.stevenlevithan.com/archives/javascript-roman-numeral-converter/feed</wfw:commentRss>
		</item>
	</channel>
</rss>
