Levels of JavaScript Regex Knowledge
Tuesday, June 26th, 2007 • Related • Filed Under
(Adapted from 7 Stages of a [Perl] Regex User.)
- N00b
- Thinks "regular expressions" is open mic night at a poetry bar.
- Uses
\w,\d,\s, and other shorthand classes purely by accident if at all. - Painfully misuses
*and especially.*. - Puts words in character classes.
- Uses
|in character classes for alternation. - Hasn't heard of the
execmethod. - Copies and pastes poorly written regexes from the web, taking credit for this on the job.
- Trained n00b
- Uses regexes where methods like
substrorindexOfwould do. - Uses modifiers like
iandmneedlessly. - Uses
[^\w]instead of\W. - Doesn't know why using
[\w\d_]gives away their n00bness. - Tries to remove HTML tags with
replace(/<.*>/g,"")orreplace(/<.*?>/g,""). - Backslashes needlessly\!
- Uses regexes where methods like
- User
- Knows when to use regexes, and when to use string methods.
- Toys with lookahead.
- Uses regexes in conditionals.
- Starts to understand why HTML tags are hard to match with regexes.
- Knows to use
(?:…)when a backreference or capture isn't needed. - Can read a relatively simple regex and explain its function.
- Haxz0r
- Uses lookahead with impunity.
- Sighs at the unavailability of lookbehind and other features from more powerful regex libraries.
- Knows what
$`,$', and$&mean in a replacement string. - Knows the difference between string literal and regex metacharacters, and how this impacts the
RegExpconstructor. - Generally knows whether a lazy or greedy quantifier is more appropriate even when it doesn't change what the regex matches.
- Knows their way around the use of
replacecallback functions. - Has read Mastering Regular Expressions.
- Knows how to "unroll the loop" (but might not yet be immune to catastrophic backtracking).
- Knows how to step through data using the
execmethod and awhileloop. - Knows that properties of the global
RegExpobject and thecompilemethod are deprecated.
- Guru
- Can explain how any given regex will or won't work.
- Can easily (and accurately) determine if a nested quantifier is safe.
- Understands the significance of manually modifying a regex object's
lastIndexproperty and when this can be useful within a loop. - Knows of numerous cross-browser regex syntax and behavior differences.
- Knows offhand the section number of ECMA-262 3rd Edition which covers regexes.
- Has a preference for particular backreference rules related to capturing group participation and quantified alternation, or is at least aware of the implementation inconsistencies.
- Often knows which browser will run a given regex fastest before testing, based on known internal optimizations and weaknesses.
- Wizard
- Works on a regex engine.
- Has patched the engine from time to time.
- God
- Can add features to the engine at a whim.
- Also created all life on earth using a constructor function.


Comment by Bowen on 27 June 2007:
I would assert that there are shades of gray between the levels. I am probably most typically in the “Trained n00b” category, however there are some aspects of the “User” category I fall into, but not all of them. So would that make me a n00bified user?
Comment by Steve on 27 June 2007:
I guess.
This is mostly just intended humorously, but it’s also meant to be soul-crushingly tough on people so don’t sweat it if you’re not yet among the highest levels.
Comment by Shady on 28 June 2007:
Level 1 is too advanced. I’m going to go emo out in a corner now. kthxbie.
Comment by Matt on 26 September 2007:
“Toys with lookahead.”
Now, do you mean “Has some clue about lookahead”, or are we talking about the same kind of toying that blew up my parent’s breaker box when I toyed with electricity in the 8th grade?
Comment by Ariel Flesler on 9 November 2007:
*Bows to the god of regex* hehe
Comment by nic_tester on 26 November 2007:
Hm, well, im more trained noob than anything else here but i desperately need to challenge this statement:
Starts to understand why HTML tags are hard to match with regexes.
Why is that hard? It seems so neat to use regexp for that. I need to nuke all html tags except for a select few from a chunk of code but all my attempts at using regexp for this go to rot
Comment by Steve on 26 November 2007:
@nic_tester:
It’s difficult (if not impossible) for the following reasons:
1. HTML tags nest. Most regex flavors do not support recursion (certainly not JavaScript’s).
2. HTML attribute values can contain unencoded
<and>characters. Whether or not they’re allowed in valid markup is often irrelevant.3. HTML attribute values can be surrounded by double quotes, single quotes, or no quotes. Also, multiple attribute values within the same tag can use different quote styles, and quoted values can contain quotes of the alternative type. All of this complicates the handling for point 2.
4. Attributes can appear in any order or not at all. This complicates things if you need to work with more than one specific attribute.
5. Browsers support a whole lot of invalid markup most people wouldn’t think about handling. Accounting for such issues is often quite difficult, and not doing so can result in security hazards.
6. HTML comments can contain HTML tags, which throws off a lot of simple handling.
7. HTML tags are sometimes mixed into content which uses unencoded
<and>characters which are not part of HTML tags.For your task, if you don’t need to account for the edge cases mentioned above you could use something like
str.replace(/<\/?(?!(?:a|select|few)\b)[^>]+>/gi, "")to get rid of all tags other thana,select, andfew.If you need additional regex construction advice you might want to try someplace like the RegexBuddy or regexadvice.com forums.
Comment by A. Nony Mouse on 6 February 2008:
In Guru level, I would add:
Knows when Regular Expressions will not work, and is not afraid of more advanced parsing methods (which typically require good understanding of regular expressions anyway).
Alas, I think I’m at the trained n00b method.
As Steve Pavlina says, if you think you’re a 7, you’re probably a 3.
http://www.stevepavlina.com/blog/2005/07/how-to-get-from-a-7-to-a-10/
Comment by Steve on 7 February 2008:
That’s already partially represented at the User level (”Knows when to use regexes…”). A guru should have learned that lesson long ago, and I don’t think that how comfortable you are with “more advanced” parsing languages/tools/models is directly related to your level of regex mastery.
Naturally, anything outside of my domain knowledge is going to be underrepresented.
Comment by Phil on 19 February 2008:
I have to agree with Steve’s statement, I don’t believe there is any connection between comfort levels for regexes and “more advanced” parsing languages … etc. I believe if more of those who are comfortable with the “more advanced” techniques took the time to really understand regexes they would likely find that much of the “more advanced” techniques are a piece of cake to replace with a relatively simple regex.
Pingback by 10 razones para aprender y usar Expresiones regulares | Picando Código on 4 July 2008:
[...] pueden coincidir prácticamente todo En otras palabras, las expresiones regulares son poderosas. Un guru de las expresiones regulares puede encontrar muchos usos apropiados para las expresiones regulares [...]
Comment by Andy on 2 December 2008:
Hey, I know this post was a while ago, but I found it looking for help on matching attributes within a HTML tag. After reading it, it made me more determined to work it out for myself, and I just did
\w+\s*=\s*(["'\w])(?:(?:.*?\1)|[^\s|>]*)
There may be an easier way but I’ve tested this and it works fine with attributes written like attr=”val”, attr=’val’, attr=’hello “value”‘ and attr=val.
I’m using it in a function that removes non-white-listed attributes (mostly to catch onmouseover, onfocus, etc). I probably wouldn’t have been as determined if I hadn’t read this post. I felt compelled to try and fit into a higher level, I’m probably a User with a few points in haxz0r. Go me!