Mimicking Lookbehind in JavaScript
Unlike lookaheads, JavaScript doesn't support regex lookbehind syntax. That's unfortunate, but I'm not content with just resigning to that fact. Following are three ways I've come up with to mimic lookbehinds in JavaScript.
For those not familar with the concept of lookbehinds, they are zero-width assertions which, like the more specific \b
, ^
, and $
metacharacters, don't actually consume anything — they just match a position within text. This can be a very powerful concept. Read this first if you need more details.
Mimicking lookbehind with the replace method and optional capturing groups
This first approach is not much like a real lookbehind, but it might be "good enough" in some simple cases. Here are a few examples:
// Mimic leading, positive lookbehind like replace(/(?<=es)t/g, 'x') var output = 'testt'.replace(/(es)?t/g, function($0, $1){ return $1 ? $1 + 'x' : $0; }); // output: tesxt // Mimic leading, negative lookbehind like replace(/(?<!es)t/g, 'x') var output = 'testt'.replace(/(es)?t/g, function($0, $1){ return $1 ? $0 : 'x'; }); // output: xestx // Mimic inner, positive lookbehind like replace(/\w(?<=s)t/g, 'x') var output = 'testt'.replace(/(?:(s)|\w)t/g, function($0, $1){ return $1 ? 'x' : $0; }); // output: text
Unfortunately, there are many cases where lookbehinds can't be mimicked using this construct. Here's one example:
// Trying to mimic positive lookbehind, but this doesn't work var output = 'ttttt'.replace(/(t)?t/g, function($0, $1){ return $1 ? $1 + 'x' : $0; }); // output: txtxt // desired output: txxxx
The problem is that the regexes are relying on actually consuming the characters which should be within zero-width lookbehind assertions, then simply putting back the match unviolated (an effective no-op) if the backreferences contain or don't contain a value. Since the actual matching process here doesn't work anything like real lookbehinds, this only works in a limited number of scenarios. Additionally, it only works with the replace
method, since other regex-related methods don't offer a mechanism to dynamically "undo" matches. However, since you can run arbitrary code in the replacement function, it does offer a limited degree of flexibility.
Mimicking lookbehind through reversal
The next approach uses lookaheads to mimic lookbehinds, and relies on manually reversing the data and writing your regex backwards. You'll also need to write the replacement value backwards if using this with the replace
method, flip the match index if using this with the search
method, etc. If that sounds a bit confusing, it is. I'll show an example in a second, but first we need a way to reverse our test string, since JavaScript doesn't provide this capability natively.
String.prototype.reverse = function () { return this.split('').reverse().join(''); };
Now let's try to pull this off:
// Mimicking lookbehind like (?<=es)t var output = 'testt'.reverse().replace(/t(?=se)/g, 'x').reverse(); // output: tesxt
That actually works quite nicely, and allows mimicking both positive and negative lookbehind. However, writing a more complex regex with all nodes reversed can get a bit confusing, and since lookahead is used to mimic lookbehind, you can't mix what you intend as real lookaheads in the same pattern.
Note that reversing a string and applying regexes with reversed nodes can actually open up entirely new ways to approach a pattern, and in a few cases might make your code faster, even with the overhead of reversing the data. I'll have to save the efficiency discussion for another day, but before moving on to the third lookbehind-mimicking approach, here's one example of a new pattern approach made possible through reversal.
In my last post, I used the following code to add commas every three digits from the right for all numbers which are not preceded by a dot, letter, or underscore:
String.prototype.commafy = function () { return this.replace(/(^|[^\w.])(\d{4,})/g, function($0, $1, $2) { return $1 + $2.replace(/\d(?=(?:\d\d\d)+(?!\d))/g, '$&,'); }); }
Here's an alternative implementation:
String.prototype.commafy = function() { return this. reverse(). replace(/\d\d\d(?=\d)(?!\d*[a-z._])/gi, '$&,'). reverse(); };
I'll leave the analysis for your free time.
Finally, we come to the third lookbehind-mimicking approach:
Mimicking lookbehind using a while loop and regexp.lastIndex
This last approach has the following advantages:
- It's easier to use (no need to reverse your data and regex nodes).
- It allows lookahead and lookbehind to be used together.
- It allows you to more easily automate the mimicking process.
However, the trade off is that, in order to avoid interfering with standard regex backtracking, this approach only allows you to use lookbehinds (positive or negative) at the very start and/or end of your regexes. Fortunately, it's quite common to want to use a lookbehind at the start of a regex.
If you're not already familiar with the exec
method available for RegExp
objects, make sure to read about it at the Mozilla Developer Center before continuing. In particular, look at the examples which use exec
within a while
loop.
Here's a quick implementation of this approach, in which we'll actually toy with the regex engine's bump-along mechanism to get it to work as we want:
var data = 'ttttt', regex = /t/g, replacement = 'x', match, lastLastIndex = 0, output = ''; regex.x = { gRegex: /t/g, startLb: { regex: /t$/, type: true } }; function lookbehind (data, regex, match) { return ( (regex.x.startLb ? (regex.x.startLb.regex.test(data.substring(0, match.index)) === regex.x.startLb.type) : true) && (regex.x.endLb ? (regex.x.endLb.regex.test(data.substring(0, regex.x.gRegex.lastIndex)) === regex.x.endLb.type) : true) ); } while (match = regex.x.gRegex.exec(data)) { /* If the match is preceded/not by start lookbehind, and the end of the match is preceded/not by end lookbehind */ if (lookbehind(data, regex, match)) { /* replacement can be a function */ output += data.substring(lastLastIndex, match.index) + match[0].replace(regex, replacement); if(!regex.global){ lastLastIndex = regex.gRegex.lastIndex; break; } /* If the inner pattern matched, but the leading or trailing lookbehind failed */ } else { output += match[0].charAt(0); /* Set the regex to try again one character after the failed position, rather than at the end of the last match */ regex.x.gRegex.lastIndex = match.index + 1; } lastLastIndex = regex.x.gRegex.lastIndex; } output += data.substring(lastLastIndex); // output: txxxx
That's a fair bit of code, but it's quite powerful. It accounts for using both a leading and trailing lookbehind, and allows using a function for the replacement value. Also, this could relatively easily be made into a function which accepts a string for the regex using normal lookbehind syntax (e.g., "(?<=x)x(?<!x)
"), then splits it into the various parts in needs before applying it.
Notes:
regex.x.gRegex
should be an exact copy ofregex
, with the difference that it must use theg
flag whether or notregex
does (in order for theexec
method to interact with thewhile
loop as we need it to).regex.x.startLb.type
andregex.x.endLb.type
usetrue
for "positive," andfalse
for "negative."regex.x.startLb.regex
andregex.x.endLb.regex
are the patterns you want to use for the lookbehinds, but they must contain a trailing$
. The dollar sign in this case does not mean end of the data, but rather end of the data segment they will be tested against.
If you're wondering why there hasn't been any discussion of fixed- vs. variable-length lookbehinds, that's because none of these approaches have any such limitations. They support full, variable-length lookbehind, which no regex engines I know of other than .NET and JGsoft (used by products like RegexBuddy) are capable of.
In conclusion, if you take advantage of all of the above approaches, regex lookbehind syntax can be mimicked in JavaScript in the vast majority of cases. Make sure to take advantage of the comment button if you have feedback about any of this stuff.
Update 2012-04: See my followup blog post, JavaScript Regex Lookbehind Redux, where I've posted a collection of short functions that make it much easier to simulate leading lookbehind.
Pingback by All in a days work… on 22 June 2007:
[…] Mimicking Lookbehind in JavaScript Shows three powerful methods to mimic regular expression lookbehind syntax in JavaScript. (tags: JavaScript RegEx) […]
Comment by telega on 6 August 2007:
I mimic lookbehind assertions this way:
str.replace(/(foo)bar/, ‘$1baz’);
negative:
str.replace(/([^f][^o][^o])bar/, ‘$1baz’);
Comment by Steve on 6 August 2007:
telega, it’s a bit of a stretch to say those mimic lookbehinds. They’re similar to the “Mimicking lookbehind with replace() and optional capturing groups” examples (which, as I’ve already stated, work very differently from real lookbehinds), but they’re less flexible. They also have the same limitation of only being usable with the replace() method.
BTW, your latter regex is flawed in that, e.g., it will not match “forbar”. You can fix it with something like /(?!foo)(.{3})bar/
Comment by telega on 7 August 2007:
You’re right, my second regex fails. But your fix is so clever, thanks!
Comment by telega on 8 August 2007:
Could you verify my regexp? I have a string containing html. I need to replace all occurrences of ‘<‘ and ‘>’ except for <br>, <br/> and <br some attributes >. I did this (including your fix):
str.replace(/<(.|(?!br).{2}[^>]*)>/gi,'<$1>');
Comment by Steve on 8 August 2007:
Change to:
str.replace(/<(?!br\b)([^>]*)>/gi,'<$1>');
But be aware that this only replaces paired “<” and “>” characters.
Comment by telega on 8 August 2007:
Thanks! I think that will do for my needs.
I was implementing “html source” button functionality for a small wysiwyg editor. If you like, you can get original version here: http://www.unverse.net/simplewhizz.html or with “html source” from my site: http://telega.phpnet.us/download/whizzette.js
Comment by Gerard Pinzone on 20 August 2007:
On the first block of code, second example: Mimic negative lookbehind, shouldn’t the return be:
return $1 ? $0 : ‘x’;
since $1 is undefined?
Comment by Steve on 21 August 2007:
@Gerard,
You’re quite right. I’ve fixed it. Note that it wouldn’t make a difference in Firefox 2 or Safari 3 beta, because of the way those browsers handle non-participating capturing groups with the
replace
method.Comment by Steven on 22 September 2007:
Any advice on looking behind to match > where it is inside of HTML content without matching the close of actual html tags. Likewise for ” and &.
I’ve had a heck of a time trying to come up with something.
Sample input might be (if this comes out right after encoding)
<html>
<head>
</head>
<body>This is a <test>.
</body>
</html>
Comment by Steve on 22 September 2007:
@Steven, I’m going to guess that you consider > characters to be separate from HTML tags if they are not preceded by a < character with no > characters in between. That would be something like
(?<!<[^>]*)>
If you don’t have real lookbehind available with your regex flavor, use the techniques outlined in this post to adapt it for your needs as appropriate.
I’m not sure what you mean by ‘Likewise for ” and &’, since those characters don’t close HTML tags like > does.
In any case, if you have questions which are unrelated to my blog posts, I would recommend you direct them to another location such as regexadvice.com or the RegexBuddy forums (both of which I hang out at from time to time).
Comment by Martin on 3 October 2007:
Just chiming in from EE (posting as mreuring there) to acknowledge that I particularly like your way outside the box thinking of reversing the string and using look-aheads on it, just to mimick look-behinds!
Nice post!
Comment by Steve on 4 October 2007:
Thanks, Martin! BTW, you can see a somewhat more complex example of using the node-by-node reversal approach to mimicking lookbehind at the end of this post: https://blog.stevenlevithan.com/archives/get-html-summary
Comment by Gianni Chiappetta on 29 January 2008:
Hey,
I’m trying to implement your negative look-behind solution in one of my current projects, but I am having a little trouble. I have a textarea with a bunch of junk in it. Out of that junk I need to match both a URL and an email address. The only problem is, if the email is before the url, then the host portion of the email is used as the URL. So I would like mimic this type of look-behind: (?>!\@) . Any ideas?
Comment by Steve on 29 January 2008:
I think you meant
(?<!@)
. Unfortunately, you haven’t given enough info to base things on. According to the mimicking-via-reversal approach, that segment of the regex would become(?!@)
.Rather than mimicking lookbehind, an alternative approach might be including the preceding character in the match and checking what it is outside of the regex. You might also be able to simply merge the two regexes using top-level alternation, with the email address part of the regex coming first.
Comment by foo on 16 April 2008:
>> “1283475822347”.replace(/(\d{1,3})(?=(?:\d{3})+(?!\d))/g,”$1,”);
“1,283,475,822,347”
Comment by Steve on 16 April 2008:
@foo, that’s not equivalent to the solution I posted. If you keep just the equivalent part from my code, you get
replace(/\d(?=(?:\d\d\d)+(?!\d))/g,'$&,')
, which is very similar to what you posted, except that it’s better (shorter, faster, etc.). 😉Comment by Gordon on 11 December 2008:
Since I’m a n00b and couldn’t figure out how to use your functions for match, rather than replace, I did the following:
var source = “Hello Something Goodbye Something Whatever Something”;
//I want to match ‘Something’, only if it has ‘Goodbye ‘ before it.
var match = source.match(/Goodbye Something/).join().match(/Something/).join();
I’ve only been messing with Javascript for a few weeks, so I have no idea how limited the above is in efficiency or usability in more complicated patterns, but it works for what I’m doing 🙂
Comment by Dewi Morgan on 11 February 2009:
We can’t do:
htmlstr = htmlstr.replace(/(((?![“>]))((?:ftp:\/\/|http:\/\/)[^”\r\n })]+|\bwww\.[^”\r\n })]+))/gm, ‘$1‘);
because Javascript doesn’t do lookbehind. But we can do:
htmlstr = htmlstr.replace(/(^|[^”>])((?:ftp:\/\/|http:\/\/)[^”\r\n })]+|\bwww\.[^”\r\n })]+)/gm, ‘$1$2‘);
[not a perfect URL match, but good enough for most]
Regexes make me happy.
Comment by David Chambers on 26 May 2009:
While limitations are often a source of frustration, dreaming up ingenious workarounds to overcome them is one of the joys of writing code. Thanks for reminding us all of this fact!
Comment by Dennis Chu on 29 November 2009:
Good Day,
Actually, I was searching thru the book and also look at your regex howevever, due to my limited understanding of implementation of regex in JavaScript, found it rather hard to understand.
I already have a Perl regex of :”(?<=\=)\w+(?=;|,)" to trap the string between the "=" sign and the "," or ";".
C132 C 1=UNNAMED_1_CON18_I1_A,
2=EARTH;
Would it be possible to should how could I store the content between the detected by the Perl Regex into a Variable for further processing
Comment by F1LT3R on 25 January 2010:
I tried your method but didnt get the results I was looking for…
When I run this expression: /(?<![#\da-zA-z])(\d+)f/g
On this string: “1.2f; #00ff00”
I should get: “1.2; #00ff00”
The ‘f’ shoudl get removed.
Any ideas?
Comment by devMan on 1 February 2010:
hello,
I want a regext javascript to retrieve the numbers followed by a:
* – Or | and
* -) (Closing parenthesis)
* – No (end of string)
and preceded by:
* – Or | and
* – ( (Open parenthesis)
* – Nothing (beginning of string)
thank you in advance
Comment by devMan on 1 February 2010:
Actually,
I have this regexp but in php:
(? <= and | or | \( ) $myString (? = (and | or | \\) )?)
Pingback by Regular Expression Basics on 7 February 2010:
[…] are ways to mimic it but that is outside the scope of this article so I’ll refer you to here: https://blog.stevenlevithan.com/archives/mimic-lookbehind-javascript <script> var string = "Here is some text about JavaScript (also known as JScript). It […]
Comment by vivin on 17 March 2010:
Have you figured out how to mimic lookbehinds in splits? For example how would one go about converting the following:
/(?<!\()(?=#)/
To work in Javascript?
Or is this a lost cause?
Comment by mukesh on 20 April 2010:
i want to know that how find safari version…
Comment by Fish on 22 July 2010:
Hi Steven ..
Using this concept, what do you think about getting “else’s”, testing if before that, has a “if” expression?
Like this:
Regexp: /(if(.|\n)*?)?else/ig
Delphi Code to Test:
case m of
2: if (y mod 4) = 0 then begin
if d>29 then vd:=false
else vd:=true ;
end
else begin
if d>28 then vd:=false
else vd:=true ;
end;
4,6,8,9,11: if d>30 then vd:=false
else vd:=true ;
else
vd:=true ;
end;
Important: I wanna get all “else”, and them test if this “else” have a “if” before that.
But using this regexp i can’t get if have a “if” inside “if”
Sorry, if that sounds a bit confusing.
Do you have a better solution?
This sounds like a challange! 😉
Tks man!
Comment by Jim on 4 August 2010:
Thanks so much for this. I have been looking for hours !
Comment by Christian Bodart on 17 January 2011:
just use string.replace optional matches and a function to filter:
var x = /(\bvar\b)?([\;\{\s+\}]*)(foodle)/gm
var str = “var foodle = 2; foodle(); “;
var x = str.replace(x,function () {
if (arguments[1]) return arguments[0]
return arguments[0].replace(“foodle”,arguments[1]+”asdfg”)
});
alert(x)
Comment by Christian Bodart on 17 January 2011:
Or if you want to generate the matches rather than do a replace then just store the results in an object etc and return arguments[0]
Comment by Emmanuel on 17 February 2011:
Great stuff! Thanks so much for this. Saved me a lot of time.
Comment by Brian Cray on 8 March 2011:
Brilliant.
Pingback by oddnetwork.org » Blog Archive » Create anchor links in Twitter status text with JavaScript on 17 March 2011:
[…] support negative look-behind constructs in its regular expressions engine. Based on some suggestions for emulating negative look-behind functionality in JavaScript I found online, I was able to take my preg_replace_all calls in PHP and convert them into […]
Pingback by My experience writing Opera extensions | HCoder.org on 9 July 2011:
[…] lookahead/lookbehind. The latter, which I needed, is not supported by Javascript, so I had to use a lookbehind mimicking trick/workaround to get what I […]
Comment by Dave on 8 September 2011:
What would be useful would be to provide the equivalent of \b(\w+)\b for getting all words in document for unicode. Xregex provides \p{L} for matching unicode characters and \P{L} as the negation so it should be possible to do something like the following
/(?<=\P{L})(\p{L}+)(?=\P{L})/
to get all unicode words
Pingback by javascript regex – look behind alternative? on 18 September 2011:
[…] Here are some thoughts, but needs helper functions. I was hoping to achieve it just with a regex: https://blog.stevenlevithan.com/archives/mimic-lookbehind-javascript […]
Comment by David on 23 September 2011:
I think the first line in the else branch of the while loop in the 3rd method needs to be changed from
output += match[0].charAt(0)
to
output += data.substring(lastLastIndex, match.index) + match[0].charAt(0)
Otherwise, if the regex.x.gRegex.exec(data) consumes multiple characters via non-matches, those characters won’t be included in the output.
Comment by David on 23 September 2011:
Also in the 3rd method, ‘regex.x.gRegex’ should NOT be an exact copy of ‘regex’. ‘regex.x.gRegex’ should be the intended regex, including any any lookAHEADS. ‘regex’ should be a copy of ‘regex.x.gRegex’ except that any positive lookaheads whose values won’t be in a match of ‘regex.x.gRegex’ (e.g., any trailing lookahead) should be excluded. Otherwise the match[0].replace(regex, replacement) call will fail because ‘regex’ includes a positive lookahead but the characters that lead to the match aren’t included in match[0].
Comment by Steven Levithan on 15 April 2012:
@David, you’re quite right. Note that I’ve posted improved and easier to use code for emulating leading lookbehind at https://gist.github.com/2387872.
Pingback by JavaScript Regex Lookbehind Redux on 15 April 2012:
[…] years ago I posted Mimicking Lookbehind in JavaScript on this blog, wherein I detailed several ways to emulate positive and negative lookbehind in […]
Pingback by Split string with a single occurence (not twice) of a delimiter in Javascript | t1u on 13 August 2012:
[…] negative lookbehind (as @jbabey mentioned these are not supported in JS) like that (inspired by this article): #crayon-50294691dd912 .crayon-plain { font-size: 12px !important; line-height: 16px !important; […]
Pingback by Adding Client-Side Support for PhoneAttribute in DAValidation on 20 September 2012:
[…] regular expressions building and testing tool Expresso. Also, a great article of Steven Levithan Mimicking Lookbehind in JavaScript helped to look deeper and actually find the right solution of the […]
Pingback by Ross Barnes – Freelance Web Developer – Glasgow, Scotland on 26 January 2013:
[…] lookbehind functionality in JavaScript, they can be mimicked using parenthesis (group matches). See here for more […]
Comment by aliteralmind on 9 April 2014:
This post has been added to the [Stack Overflow Regular Expression FAQ](http://stackoverflow.com/a/22944075/2736496), under “Lookarounds”.
Pingback by Reference - What does this regex mean? - Technology on 22 February 2015:
[…] Javascript negative lookbehind equivalent External link […]
Comment by haymaker on 29 April 2015:
nothing like writing a blog post on doing something regex, to spark an entire comment thread asking to verify their regex like it’s stackoverflow or something
Comment by Roman on 15 February 2017:
Please, help!
I need to extract Text Goes Here from (without angle brackets) using Javascript/VBscript regexp. Neither of them support look behind (to strip off the first angular bracket). What would be the alternative?
Thank you in advance!
Comment by RegNovice on 21 May 2018:
I have the following problem scenario:
Objective: Get rid off all numerics except when the digits are preceded by the term ‘ITEMSTARTMARKER’
Sample Text: ITEMSTARTMARKER 8 components interest rates 77 term debt ITEMSTARTMARKER 16 fiscal year 2017
I use the negative lookahead (?<!ITEMSTARTMARKER )\d+ but it doesn't work when more than one digit is followed by ITEMSTARTMARKER. It matches the 6 in ITEMSTARTMARKER 16.
Any ideas how I can get around it?
Pingback by javascript regex — ?????? ????????????? — ??????? ? ?????? ?? ???????????????? on 7 October 2018:
[…] Вот некоторые мысли, но нужны вспомогательные функции. Я надеялся добиться этого только с помощью регулярного выражения: https://blog.stevenlevithan.com/archives/mimic-lookbehind-javascript […]
Pingback by Reference – What does this regex mean? – PythonCharm on 12 November 2018:
[…] JavaScript negative lookbehind equivalents External link […]
Pingback by Regex replace (in Python) – a simpler way? – PythonCharm on 8 January 2019:
[…] solution, but it’s still a very clear, straightforward one-liner. And if you look at what an expert has to say on the matter (he’s talking about JavaScript, which lacks lookbehinds entirely, but many of the principles […]
Pingback by javascript - Javascript regex - divide una cadena on 7 March 2019:
[…] página es probable que: blog.stevenlevithan.com/archives/mimic-lookbehind-javascript No publicar esto como un respuesta, porque yo no tengo tiempo ahora para venir para arriba con un […]
Pingback by javascript - javascript regex - mira detrás de alternativa? on 18 June 2019:
[…] Aquí están algunas ideas, pero las necesidades de las funciones auxiliares. Tenía la esperanza de lograr sólo con una expresión regular: https://blog.stevenlevithan.com/archives/mimic-lookbehind-javascript […]
Pingback by javascript - javascript regex - look dietro alternativa? on 22 September 2019:
[…] Qui sono alcuni pensieri, ma necessita di funzioni di supporto. Speravo di raggiungere solo con una regex: https://blog.stevenlevithan.com/archives/mimic-lookbehind-javascript […]
Pingback by Reference – What does this regex mean? – ThrowExceptions on 20 February 2020:
[…] JavaScript negative lookbehind equivalents External link […]
Pingback by Reference – What does this regex mean?-ThrowExceptions – ThrowExceptions on 3 April 2020:
[…] JavaScript negative lookbehind equivalents External link […]
Pingback by Code Bug Fix: Reference - What does this regex mean? - TECHPRPR on 11 July 2020:
[…] JavaScript negative lookbehind equivalents External link […]
Pingback by RegEX, Sigma, Yara, Snort/Zeek/Bro | Cloud Solutions Architect on 24 January 2021:
[…] https://blog.stevenlevithan.com/archives/mimic-lookbehind-javascript […]