Many people have contributed to developing and promoting the use of regular expressions since they were invented about half a century ago. Here's a short list of some of the most influential people behind the technology. I've written this up for two reasons:
- For people who've only gotten into the technology recently but are interested in some of the history and pioneers behind it.
- Since I fit the above description, I'm hoping readers will help fill me in on other people I've forgotten about or otherwise left out.
Alfred Aho
Aho is the "A" in AWK, and co-author of the Dragon Book – a classic reference covering such topics as building regular expression compilers. He created the initial version of egrep, which provided a big jump in expressiveness from the primitive beginnings of early Unix grep.
Websites: Wikipedia, @Columbia U (photo source)
Jeffrey Friedl
Friedl began using regular expressions with Unix in 1980. He has since written the definitive work on the subject: Mastering Regular Expressions, published by O'Reilly Media. Now in its third edition, it is widely considered a classic programming book (see e.g. this Slashdot review). The longevity of his experience with regexes helps to make him a shrewd opponent in regex debates… odds are he's already succinctly countered your quackery ten or more years ago on Usenet, and has the links to prove it. Friedl lives in Kyoto, Japan with his family of three.
Websites: Blog (photo source), O'Reilly bio
Jan Goyvaerts
Goyvaerts – a Belgian who's been living in Thailand for several years – is not as widely known as the others on this list, but his contributions towards helping thousands of people learn and use regular expressions are significant. Goyvaerts creates the best-in-class tools RegexBuddy and PowerGREP, which use his own JGsoft regex engine (notable for its support of most syntax from popular regex flavors including Perl, .NET, and Java). His website regular-expressions.info – based on the PowerGREP/RegexBuddy help files – is the best and most popular multi-flavor regex tutorial or reference online.
Edit: A year after this post, Jan and I coauthored Regular Expressions Cookbook, now in its second edition.
Websites: Regex blog, Just Great Software (photo source)
Philip Hazel
Hazel grew up in South Africa and has a PhD in applied mathematics. He's best known for writing Exim (a popular open source mail transfer agent) and the PCRE regex library. PCRE is one of the best regex libraries in the world and is used by many projects including Apache, PHP, and probably thousands more. Hazel worked for the University of Cambridge's Computing Service for over 30 years until he retired at the end of September 2007.
Websites: Personal site, UIT Cambridge bio (photo source)
Stephen Kleene
In the 1950s, distinguished American mathematician Stephen Kleene invented regular expressions, which is what he called his notation for expressing the algebra of regular sets. The regex *
metacharacter (called the Kleene star) is named after him. Kleene helped lay the foundations for theoretical computer science through his work on recursion theory, which resulted in him being awarded the National Medal of Science in 1990.
Websites: Wikipedia, Bio at nap.edu (photo source)
Henry Spencer
Spencer is a Canadian programmer and space enthusiast who created three widely used, adapted, and influential regular expression libraries. In 1986, he was the first to release a regex library which could be freely included in other programs. Perl 2's regex package was based on and enhanced from Spencer's library, but Spencer's technological tour de force was creating the regex package used by Tcl. This implementation, Jeffrey Friedl writes, "is a hybrid [NFA/DFA engine] with the best of both worlds".
Websites: Wikipedia, O'Reilly bio, Lysator, Bio at NASA (photo source)
Ken Thompson
Thompson is a hacker demigod and the principal inventor of Unix. He received the Turing Award in 1983, the National Medal of Technology in 1998, and the IEEE's Tsutomu Kanai Award in 1999. Thompson introduced regular expressions to the computing world by building Stephen Kleene's notation into his version of the QED text editor, and later ed and other tools. Thompson's original regular expression search implementation is still considered by some to be superior to modern, backtracking algorithms. Did I mention this dude flies MiG fighter jets for fun?
Websites: Wikipedia, Linfo, @Bell Labs, Bio at Bell Labs (photo source)
Larry Wall
Wall created and continues to oversee development of Perl, which has done more than any other programming language to popularize and extend the power of regular expressions. Many programming languages including Java, JavaScript, the .NET Framework, PHP, Python, and Ruby have since adopted regex syntax and features similar to Perl's. The recently released Perl 5.10 continues to push the state of the art in regex power, and upcoming changes outlined by Wall for Perl 6 (called Perl 6 rules; described in Apocalypse, Synopsis, and Exegesis 5) fearlessly redesign Perl's regular expression language.
Websites: Personal site, Wikipedia (photo source)
I'm still a newcomer to the field, so please let me know if you think there are others who should be on this list.
Alternate title for this post is “Men Steven Levithan Would Love To Sleep With.”
Not that there’s anything wrong with that.
You should add Jamie Zawinski (www.jwz.org) for popularizing:
“Some people, when confronted with a problem, think ‘I know, I’ll use regular expressions.’ Now they have two problems.”
and let’s not forget:
http://xkcd.com/208/
Dexter Kozen, anyone?
http://en.wikipedia.org/wiki/Kleene_algebra
The entries are obviously listed in (reverse?) order of Importance to Mankind, but I just had to comment on the amazing coincidence that the ordering is also alphabetical by last name. How amazing a coincidence is that!? π
Alfred Aho created the fairly big jump from early Unix’s really weak grep to the vastly more expressive egrep syntax (and from NFA to DFA). Prior to that, alternation wasn’t even supported. Dr. Aho should probably be added to the list, even if for no other reason than to get my highly unphotogenic face below the fold.
@Jeffrey, thanks for the note on Aho. He’s been added to the top of the list.
@William, I honestly thought about finding some bullshit way to add Angelina Jolie to the list, but ultimately her regex qualifications just didn’t hold up. I hope you enjoy all the manly computer scientist photos in her absence.
Don’t be so shy, Jeffrey! You’re the second-best-looking chap on the list. π
I know. I’m more comfortable behind the camera too.
Edi Weitz: http://weitz.de/cl-ppcre/
Normally, this post would have no meaning to me, as I will probably never meet those people and I wouldn’t care who they were when reading their books. However, it says a lot about you, the man who runs a regex oriented blog and who spent the time and energy to build a very complex post with links and everything to honor his mentors. I agree, you should be on that list somewhere.
To be fair, Angelina Jolie still puts words in character classes, so I agree with your decision. However, I heard Scarlett Johansson is becoming pretty handy with nested quantifiers and lookbehinds. You might want to check up on that.
@Siderite, to put my name up there fairly, this list would have to become impossibly long. But I appreciate the sentiment.
@William, that must explain why Ms. Johansson has in recent years eclipsed longtime winner Jolie on my personal hotlist.
Speaking of womenfolk, the most famous woman’s name in regular expressions is probably Abigail of Perl fame (who just happens to be a Dutchman).
I think this is becoming a great Regular Expression problem: write a regex to match female names, but not male. Then globalize it! =))
Well done, good list. In most cases, the people behind technical achievements are forgotten – posts like this help to give them credit.
With all due respect to such luminaries and at risk of sounding stupid what is really stupid is why regular expressions as a so-called “pattern matching language” remain crippleware for client-side web development validation at this point in time 2010 failing to support a set of whole numbers and a real range operator that does not compel us to use fake string representations as actual whole numbers wasting a whole damn day and longer to determine if a text entry representing whole numbers cannot be typed as real numbers to easily and efficiently determine an inclusive range such as 88-144.
Thanks for the list.
Now I know whom to hate!
I pray I will meet one of the persons in this list face to face one day. Regexp are the worst thing in the world. For me every person listed here is a demon from hell. May they burn there!
Nice post. Came up near the top of the search results for “Who invented regular expressions”. Loved reading the comments, they made me laugh. Thanks.
where am all da african americans at?
Interesting stuff. I like finding out this sort of thing. Your post covers Who. I’m wondering Why? Not what what it was created for, but what was the thought process in doing this the way it was done… Why was it done this way? It would be interesting I think to understand the thought process involved in coming up with Regex in the first place.
I also wonder why it seems to be mearly a language. It seems that, for example, in JavaScript, you should be able to dynamically build a Regex statement as a string and then run it against another string. But you can’t. Oh, you could build a whole regex thing and use the eval() function but this smells too much like a hack. If you do not do the hack you are relegated to hard coded regex statements!
@Orville Chromer, you can build a regex in JavaScript dynamically using the
RegExp
constructor. E.g.,var regex = new RegExp(str + '$');
Where are the women?
Steven, the links for Philip Hazel have changed.
– His personal site is gone (the domain is for sale).
– His author bio page is gone.
– The book link today is: http://www.uit.co.uk/the-exim-smtp-mail-server
The updated Bell Labs links for Ken Thomson:
– QED: http://cm.bell-labs.co/who/dmr/qed.html
– MIG: http://cs.bell-labs.co/who/ken/mig.html
Thanks for the lovely overview!
@Lisa, what do you mean exactly? Where are the regex women, indeed? Tell us, for we don’t know. If there were any, they would be listed, most evidently.
Russ Cox wrote a really good paper years ago that explains how the regex engine Spencer wrote and that is in Tcl is not subject to pathological asymptotic events, but PCRE and others are. Find it here:
Regular Expression Matching Can Be Simple And Fast
(but is slow in Java, Perl, PHP, Python, Ruby, …)
https://swtch.com/~rsc/regexp/regexp1.html
/s.
perhaps, not intent-fully, but you forgot to mention Chomsky who, apart from being notable political dissident, is also a great linguist. he introduced the formal definition of four different types of grammars, one of them being the regular grammar, which , together with Kleene’s theory is of paramount importance to all grammar stuff and automata theory.
also is worth nothing BurntSushi (Andrew Gallant) – the author of rust’s ripgrep which more or less triggered the evolution of parallel processing of grep queries into large quantities of files (and is very very versatile and stable).
@pnkv, it’s intentional, since IMO Chomsky’s influence is too indirect. I have to draw a line somewhere, and I’ve never read a history of regular expressions that gives direct credit to Chomsky’s work.
These days people study regular languages as part of the Chomsky hierarchy, but back when regular expressions were introduced into computing the term referred to Kleene’s regular expressions. See e.g. Logical Instruments: Regular Expressions, AI and thinking about thinking by Christopher M. Kelty, which goes into some detail about the work of Kleene, Thompson, and others.
@Scott G, I linked to that article by Cox in the post, in the section on Ken Thompson. π