ColdFusion in Context: Regular Expression Input Interpreter

A string may be described in many different ways. Suppose you want to see how multiple regular expressions interpret a single input. This tip provides a way to do that and provides many sample expressions, including one to validate E-mail addresses. To save coding time, it generates code based on structured input instead of making you write it yourself.

Form

Put all this code in regexes.cfm. Introduce the form. Define a default data value for the form. Use the textarea field instead of a text field in order to enter (and later detect) control characters as part of the data. Close the form.

<h4>Regular Expression Input Interpreter</h4>
A string may be described many different ways.<br>
Press the button to see how multiple regular expressions interpret your input.
<p>
This field...
<cfparam name="form.Data" default="">
<form method="post">
<cfoutput>
<textarea name="Data" rows="2" 
cols="50">#form.Data#</textarea><br>
</cfoutput>
<input type="submit" name="Go">
</form>

Data

If the form has been submitted, evaluate the data. If the field is empty, say so. Otherwise, define description and expression structures and fill in many description/expression pairs so that the code can display a description of what it found if the given regular expression is true. To save typing, you can perform this definition task within cfscript tags. Leave space at the end to remind yourself to add more tests later. Close the script.

As to the source of these expressions, this is not a tutorial on regular expressions, but by experimenting with the data, you can see what causes each expression to fire. If the first character is an up-caret (^) and the last character is a dollar sign ($), the entire string has to be matched by the expression, not just a portion of the string. Where you see a backslash, it's typically used as an escape character to cause the special character that follows it to be interpreted as ordinary text.

<cfparam name="form.Go" default="">
<cfif not len(form.Data)>
  is empty
<cfelse>
  contains...<br>
  <cfscript>
  // Rd: Regular expression Description
  // Rt: Regular Expression
  Rd=arrayNew(1);
  Re=arrayNew(1);
  // Define an Rd/Re pair for each test...
  Rd[1]="only digits";
  Re[1]="^[[:digit:]]+$";
  Rd[2]="digits";
  Re[2]="[[:digit:]]+";
  Rd[3]="only letters";
  Re[3]="^[[:alpha:]]+$";
  Rd[4]="letters";
  Re[4]="[[:alpha:]]+";
  Rd[5]="only letters and numbers, or just letters, or just numbers";
  Re[5]="^[[:alnum:]]+$";
  Rd[6]="only white space: spaces, tabs, newlines, carriage returns";
  Re[6]="^[[:space:]]+$";
  Rd[7]="spaces, tabs";
  Re[7]="[[:blank:]]+";
  Rd[8]="control characters (including newlines, carriage returns)";
  Re[8]="[[:cntrl:]]+";
  Rd[9]="only non-white space, non-control characters";
  Re[9]="^[[:graph:]]+$";
  Rd[10]="punctuation";
  Re[10]="[[:punct:]]+";
  Rd[11]="upper-case letters";
  Re[11]="[[:upper:]]+";
  Rd[12]="lower-case letters";
  Re[12]="[[:lower:]]+";
  Rd[13]="only a date in d/m/YYYY format; no range check was made";
  Re[13]="^[[:digit:]]{1,2}\/[[:digit:]]{1,2}\/[[:digit:]]{4}$";
  Rd[14]="only an E-mail address"; // nope
  Re[14]="^(([[:alnum:]]|-|_)+\.)*([[:alnum:]]|-|_)+@(([[:alnum:]]|-|_)+\.)*[[:alpha:]]{2,7}$";
  Rd[15]="only a number";
  Re[15]="^[-+]?(([0-9]+\.?[0-9]*)|(\.[0-9]+))$";
  Rd[16]="a date in d/d/YY or d/d/YYYY format";
  Re[16]="((\d{2})|(\d))\/((\d{2})|(\d))\/((\d{4})|(\d{2}))";
  Rd[17]="typical first name and last name: assumes simplest capitalization";
  Re[17]="^[A-Z]{1}[a-z]{1,}[[:space:]][A-Z]{1}[a-zA-Z]{1,}$";
  Rd[18]="only a one- or two-digit month and 2-digit year: m/YY; checks month range";
  Re[18]="^([1-9]|(0[1-9])|11|12)/[[:digit:]]{2}$";
  Rd[19]="only a 2-digit month and 2-digit year: MM/YY; checks month range";
  Re[19]="^((0[1-9])|(1[0-2]))\/([[:digit:]]{2})$";
  Rd[20]="U.S. zip code of five digits (or nine digits separated by hyphen); no range check was made";
  Re[20]="^[[:digit:]]{5}(-[[:digit:]]{4})?$";
  Rd[21]="only a string that could be a simple variable name";
  Re[21]="^[A-Za-z]+[[:alnum:]]*$";
  Rd[22]="only a string that could be a variable name expressed as {scope or structure}.{simple variable}";
  Re[22]="^[A-Za-z]+[[:alnum:]]*\.[A-Za-z]+[[:alnum:]]*$";
  Rd[23]="only a social security number; no range check was made";
  Re[23]="^[[:digit:]]{3}-[[:digit:]]{2}-[[:digit:]]{4}$";
  Rd[24]="only a 24-hour time formatted as HH:MM:SS";
  Re[24]="^(([0-1][0-9])|([2][0-3])):([0-5][0-9]):([0-5][0-9])$";

  // (add more pairs here)

  </cfscript>

Code Generator

Loop the parallel arrays within cfoutput tags. For each expression, use ColdFusion's regular expression find function, reFind, and if the expression is true, display its description.

  <cfoutput>
  <cfloop from="1" to="#arrayLen(Re)#" index="Nr">
  <cfif reFind("#Re[Nr]#",form.Data)>
    - #Rd[Nr]#<br>
  </cfif>
  </cfloop>
  </cfoutput>
</cfif>

Discussion

Browse Regexes.cfm. As you change data values, see which expressions fire. Add some of your own. There are many ways to describe a single piece of data. This tip helps you see some of them. =Marty=