Regex Syntax Highlighter

Do you regularly post regular expressions online? Have you seen the regex syntax highlighting in RegexPal, RegexBuddy, or on my blog (example), and wanted to apply it to your own websites? Prompted by blog reader Mark McDonnell, I've extracted the regex syntax highlighting engine built into RegexPal and made it into its own library, unimaginatively named JavaScript Regex Syntax Highlighter. When combined with the provided CSS, this 1.6 KB self-contained JavaScript file can be used, for instance, to automatically apply regex syntax highlighting to any HTML element with the "regex" class. You can see an example of doing just that on my quick and dirty test page.

Highlighting example: <table\b[^>]*>(?:(?=([^<]+))\1|<(?!table\b[^>]*>))*?</table>

Although the library is simple (there's just one function to call), the syntax highlighting is pretty advanced and handles all valid JavaScript regex syntax and errors (with errors highlighted in red). An example of its advanced highlighting support is that it knows, based on the context, whether \10 is backreference 10, backreference 1 followed by a literal zero, octal character index 10, or something else altogether due to its position in the surrounding pattern. Speaking of octal escapes (which are de facto browser extensions; not part of the spec.), they are correctly highlighted according to their subtle differences inside and outside character classes (outside of character classes only, octals can include a fourth digit if the leading digit is a zero). As far as I'm aware, this is the first JavaScript library for highlighting regex syntax, with or without the level of completeness included here. For people who might feel inclined to use or improve upon my work, I've made the licensing as permissive as possible to avoid getting in your way. RegexPal is already open source under the GNU LGPL 3.0 License, but this new library is released under the MIT License. If you plan to customize or help upgrade this code, note that it could probably use a bit of an overhaul (it's ripped from RegexPal with minimal modification), and might require an overhaul if you want to cleanly add support for additional regex flavors. Another nifty feature I plan to eventually add is explanatory title attributes for each element in the returned HTML, which might be particularly helpful for deciphering any highlighted errors or warnings. Let me know if this library is useful for you, or if there are any other features you'd like to see added or changed. Thanks! Link: JavaScript Regex Syntax Highlighter.

24 thoughts on “Regex Syntax Highlighter”

  1. You know what I really want from RegexPal? Some replace features. Whenever I have to do a quick replace I’ve been hitting up http://www.regextester.com/ but it’s not nearly as nice as RegexPal.

    Except for the fact that it does replacement.

    Can’t imagine it would be that hard to implement and I’ll totally hook you up with some Mountain Dew and a pack of cigarettes next time I see you.

  2. Tsk tsk, William. Define “that hard to implement.” πŸ˜›

    To do it right, it would require two new fields–one for replacement text (which would need its own dedicated replacement-text-syntax highlighting) and a read-only field to store the result. With two new multiline textareas, things would start to get a bit cluttered (IMO, part of the appeal of RegexPal is its simplicity and its large regex and subject text textareas). Hence, in order to avoid unnecessary clutter when its not needed, you’d need to be able to easily switch between match mode (two fields) and replace mode (four fields). I would also want to highlight replacements in the replacement result similar to matches in the subject text, to help users quickly identify what changed. Then it might be helpful to add a way to quickly transfer the replacement result to the subject text textarea. Keeping in the spirit of RegexPal might also require the replacement result to be updated automatically as you type. But then, that might be annoying sometimes, so an option would be needed to turn off automatic updating. And so on…

    I’d like to add replacement functionality, but unfortunately it would take more work than I’m able to dedicate to it in the near future. (I’d rather keep RegexPal simple and elegant than half-ass replacement features.)

  3. Yeah, I hadn’t envisioned hacking up the RegexPal UI. I’m sure you’ve heard of tabs. πŸ˜‰ Or a simple toggle switch at the top.

    Real-time would be nice but I’d be happy with a pretty simple replace.

  4. For the last two weeks, I’ve been working on a similar project, but instead of having static colored syntax, mine provides dynamic visual feedback by highlighting components when moused over. During the course of developing this, I came across this mighty fine website here. After perusing your code, and reading some of the archive articles, I can see why you have named this place: Flagrant Badassery. The name is certainly fitting! (I now wish that I had better praised you in my Amazon review of your cookbook – like Jan, you indeed “know regex-fu!”) Its refreshing to see such well structured and properly documented code. I was impressed enough with your Javascript know-how on display here, that I just ordered a copy of: “High Performance Javascript”.

    If your interested in checking it out, I’ve released my script as open source. It doesn’t have all the cool syntax error checking that your has, but I think it does fill another niche:

    Github repo:
    http://github.com/jmrware/DynamicRegexHighlighter
    Main page:
    http://jmrware.com/articles/2010/dynregexhl/DynamicRegexHighlighter.html
    Tester page:
    http://jmrware.com/articles/2010/dynregexhl/DynamicRegexHighlighterTester.html

    Thanks for this great blog – you have a new fan!
    Jeff Roberson

  5. @ridgerunner, that’s awesome–thanks for sharing. Sorry I didn’t respond earlier but I’ve been on vacation… I’d like to check out your project more closely when I have some time. I’d also like to improve the code for this regex coloring script (it’s largely just copy-pasted from my RegexPal code that hasn’t been updated since 2007) and add some additional features like optional support for common non-JavaScript regex features like the extended-mode regexes on your demo page. Unfortunately, that might have to wait a while though since my recreational development time is severely limited at the moment. πŸ™

  6. Bug report: If one has a regex which is matching an html entity string (e.g. the entity for a left angle bracket: ‘/&lt;/‘), this needs to be be encoded on the web page in the .regex element as ‘/&amp;lt;/‘ to be valid HTML. But after being processed by jsresyntaxhighlighter.js, this will be displayed as: ‘/</'.

    The culprit is this code at the top of jsresyntaxhighlighter.js:

    value = compressHtmlEntities(value);

    This compresses: ‘&amp;lt;‘ down to: ‘&lt;‘ which then erroneously becomes ‘<‘ when it is later passed through expandHtmlEntities(). I don’t think this initial call to compressHtmlEntities() is really necessary – is it?

  7. I am passing along a possible bug I may have found. If the string to match on is:

    A.BC
    A.B
    A.BD

    Maybe I’m misinformed, but I expected the following expressions to produce the same results:
    A.((BC)|(BD)|(B))
    A.((BC)|(B)|(BD))

    The second, unfortunately, misses the last letter. Otherwise, great tool!

  8. @Greg B, you’re misinformed. Assuming you’re talking about RegexPal, rather than Regex Syntax Highlighter, JavaScript regexes (as with most other modern regex engines) try alternatives from left to right, and end as soon as a successful match is found (i.e., no need to find the longest leftmost match as a POSIX regex would).

  9. I had an xml file with certain tag

    <xyz> 1800 < 3944877 </xyz>

    in this the less than sign is causing problem to read the xml file so I was thinking to use regex to find the less than sign and replace it.
    I am using c#

    Can any tell me the regex or is there any other way i can do it

    I write this :
    (<[^]*?>).*(<).*?(<[^]*?>)
    but it holds good till each tag is separated with new line.

  10. Is XRegExp broken with jQuery 1.7.1. If I plugin your example text from the API page, I don’t get the same output that you do. In fact, nothing changes when I try to use an XRegExp.replace….Any help would be greatly appreciated.

  11. Just wanted to say great work with regexppal. It’s a really helpful tool. Thanks for letting everyone use it!

  12. @Jason, I’d need to see example code to troubleshoot. XRegExp 1.5.1 and later should work just fine with jQuery 1.7.1. (XRegExp 1.5.0 had an edge case IE bug with jQuery 1.7.1, but what you wrote doesn’t sound like the same thing.)

    @ridgerunner Was just checking out your project again. ‘Tis tres cool, and I’m hoping to look into using it in future versions of RegexPal. Are you on Twitter? Here’s my recent Twitter shout-out: https://twitter.com/slevithan/status/180851694668746753

  13. @Dan-el, that’s great! However, note that you’re commenting on a blog post that has nothing to do with XRegExp. I’ve therefore mentioned it on Twitter. Also note that it looks like you’ve inappropriately added the XRegExp.prototype instance methods provided by the XRegExp Prototypes addon (call, apply, xexec, xtest, etc.) as if they were methods of the XRegExp object, which they are not.

Leave a Reply

Your email address will not be published. Required fields are marked *