Following are some UDFs I wrote recently to make using regexes in ColdFusion a bit easier. The biggest deal here is my reMatch()
function.
reMatch()
, in its most basic usage, is similar to JavaScript's String.prototype.match()
method. Compare getting the first number in a string using reMatch()
vs. built-in ColdFusion functions:
- reMatch:
<cfset num = reMatch("\d+", string) />
- reReplace:
<cfset num = reReplace(string, "\D*(\d+).*", "\1") />
- reFind:
<cfset match = reFind("\d+", string, 1, TRUE) />
<cfset num = mid(string, match.pos[1], match.len[1]) />
All of the above would return the same result, unless a number wasn't found in the string, in which case the reFind()
-based method would throw an error since the mid()
function would be passed a start
value of 0. I think it's pretty clear from the above which approach is easiest to use for a situation like this, and it would be easy to envision scenarios where this functionality could more drastically improve code brevity.
Still, that's just the beginning of what reMatch()
can do. Change the scope
argument from the default of "ONE" to "ALL" (to follow the convention used by reReplace()
, etc.), and the function will return an array of all matches. Finally, set the returnLenPos
argument to TRUE and the function will return either a struct or array of structs (based on the value of scope
) containing the len, pos, AND value of each match. This is very different from how the returnSubExpressions
argument of reFind()
works. When using returnSubExpressions
, you get back a struct containing arrays of the len and pos (but not value) of each backreference from the first match.
Here's the code, with four additional UDFs (reMatchNoCase()
, match()
, matchNoCase()
, and reEscape()
) added for good measure:
See the demo and get the source code.
Now that I've got a deeply featured match function, all I need Adobe to add to ColdFusion in the way to regex support is lookbehinds, atomic groups, possessive quantifiers, conditionals, balancing groups, etc., etc.…
Hey Steve,
After playing with your REMatch method (which has helped me more than once) I made a little change. I added ‘SUB’ to the scope argument, which will loop over each match and return sub-matches. You can read all about it here. I don’t have code snippets in the blog yet, but there is a download available. If you have any suggestions please let me know.
Didn’t mean to be Anon on that last one!
Andrew,
Glad to hear this helped you. That is a potentially very useful modification.
By the way, when I wrote this, I wasn’t aware that you could use underlying Java regex methods in ColdFusion. If I ever get around to releasing an updated version of REMatch and Adobe doesn’t include something similar natively in CF8, I’ll use the Java methods, which offer better performance and more powerful regular expression syntax (e.g., lookbehind). That would be my main suggestion for your CFC… use Java.
Thanks for posting!
Hey Steve, me again. I gave the java.util.regex package a shot, and was able to get a basic version working. Check it out here
Steve,
there seems to be a bug with your rematch function. When trying to do an http request to google for some search results from imdb (sample url is http ://www.google.com/search?q=imdb+Police+Academy&ie=utf-8 &oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a).
and then using your function to match the found links from IDMB:
this returns the right number of matches (2) but both at the same position and with the same length. I tried the same thing (with scope “all”) with Andrew’s mod and his works as expected. Just FYI.
Thanks for the report. Does it work as you expect in Andrew’s CF version, Java-based version, or both? I’ll have to look at this later, but for now you might want to use the mod if it’s working correctly.
Hey Boyan, you do know that reMatch is in CF 8, don’t you?
Todd, I’m sure Boyan is aware of it now. I believe his comment was posted before the CF8 beta was available.
I get the following error when I try to implement it:
“The names of user-defined functions cannot be the same as built-in ColdFusion functions.
The name reMatch is the name of a built-in ColdFusion function.
The CFML compiler was processing:
A cffunction tag beginning on line 1, column 2.
The error occurred in C:\Inetpub\wwwroot\test\regex3.cfm: line 1
1 : <cffunction name=”reMatch” output=”false”>
2 : <cfargument name=”regex” type=”string” required=”yes” />
3 : <cfargument name=”string” type=”string” required=”yes” />
”
Any ideas?
That means you’re using ColdFusion 8, which includes a (much less flexible)
reMatch
function natively. I posted this well before any official word about CF8. Does the nativereMatch
not work for your needs?