Solving Algebraic Equations Using Regular Expressions

Regexes suck at math. To a regex engine, the characters 0 through 9 are no more special than any others.

I should mention that there are a couple exceptions. Perl and PCRE allow dynamic code to be run at any point during the matching process, which presents a great deal of extra potential. Perl does this with code embedded in the regex; PCRE with callouts to external functions. But those regex flavors are the exceptions, and even with them the extended capabilities only take you so far. Generally, math-related problems such as matching numeric ranges (useful for tasks like matching a set of years within longer text) are a pain in the a** to deal with, when they're possible at all.

However, the power and expressiveness of even basic regular expression syntax can lead to some nifty tricks. Things like matching only non-prime-length strings! The primality regex is somewhat famous now, but another hack that might surprise you is using regexes to solve a simple class of linear equations. I stumbled on the idea for this pattern while messing around with RegexBuddy's awesome debugger. The implementation itself is dead simple and should work pretty much universally, with the exception of strict POSIX ERE implementations or other esoteric flavors which don't allow backreferences. Here's the template:

^(.*)\1{A−1}(.*)\2{B−1}$

That lets us solve for x and y with an equation like 17x + 12y = 51. A and B are placeholders for constants that in this case are 17 and 12. So, the regex becomes ^(.*)\1{16}(.*)\2{11}$. We subtract one from values A and B because we're repeating backreferences to subpatterns that have already matched once before. If you run that regex against a 51-character string, the length of $1 (backreference one) will be 3 (which tells us that x = 3), and the length of $2 (backreference two) will be 0 (meaning that y = 0). Indeed, 17×3 + 12×0 = 51. If the problem is unsolvable, the regex will not match the string. If there are multiple possible solutions, the one with the highest value of x will be returned since x is handled earlier in the regex.

Try it out. You can use as many variables as you'd like as long as the equation follows the same form. E.g., 11x + 2y + 5z = 115 can be solved with ^(.*)\1{10}(.*)\2{1}(.*)\3{4}$ and a 115-character subject string (the result: 11×10 + 2×0 + 5×1 = 115). Run ^(.*)\1{12}$ against a 247-character string and you'll get back a 19-character value for backreference one, demonstrating that 13×19 = 247. Keep in mind that as the integers and string lengths get higher and the number of variables increase, the amount of backtracking by the regex engine will also increase. At some threshold this pattern will slow to the point that it's unusable. But I don't really care; it's still cool. wink

18 thoughts on “Solving Algebraic Equations Using Regular Expressions”

Cool. Just to be anal: you have *one* solution of many for the 17x + 12y = 51; 😀

You need x number of equations for y number of variables. I’d go into the details, but it’s 3AM and I have a deadline at 3pm! oo noes.

Great article.

Cool. I wrote a python function to do this based on your method:

http://xix.org/misc/solve.py

To match all solutions in Perl:

local our @solutions;
/
…regex from article…
(?{ push @solutions, [$1,$2,$3] })
(?!) # Force backtracking.
/x

Have hammer. Seems like a nail. *Swing*

This seems similar (but more complex): Diophantine Equation Solver. I’m not very familiar with Perl though (aside from its regex flavor), and haven’t delved into the code.

Pingback: thak’s cool links

Pingback: Weekend Links