Suppose you wanted ColdFusion to correctly parse a file consisting of variable-length fields. How would you do it? Here's an example from ANSI X12, a transfer method with variable-length fields, to show why the obvious approach to parsing this kind of data isn't the right approach. Along the way, you'll use ColdFusion index loops and regular expressions.
G72*41*02***-2.180*36.000*CA
All users of ANSI X12 work from a common dictionary. They don't use every element available, but if they do use an element, it has the meaning given in the standard.
The power of X12 is that its segment and element meanings have been agreed on through decades of negotiation across all the business sectors you might think of, the X12 transactions built up from these pre-defined building blocks have achieved the same broad level of acceptance, and the X12 protocol uses very little bandwidth. And did I mention that X12 includes checksums and automated feedback that assures the sender that the transmission was received?
XML, a transfer method using named fields, takes perhaps six times as much space as X12 to pass the same amount of data, lacks these additional features, and XML document formats are still hammered out on a case-by-case basis (or at best by a few companies within a single type of business).
You might be tempted to make this the heart of your parser to break down a single line:
<cfloop index="Field" list="#Record#" delimiters="*">
This would be OK if you never had two delimeters with nothing between them. However, you saw in the example that when an optional value is left out, the delimiters before and after it touch.
G72*41*02***-2.180*36.000*CA
If you tried to parse this line using the method above, here is how the loop would break it down: G72 (the label), 41, 02, -2.180, 36.000, CA. This would cause -2.180 to appear to be the charge number (element 3) instead of the charge amount (element 5). Your parser needs to break it down as G72, 41, 02, {empty}, {empty}, -2.180, 36.000, CA.
So what causes this problem? It turns out that the index loop treats multiple consecutive delimiters as a single delimiter; it will not loop at all for the empty values between them.
So in order to loop for empty values as well as non-empty ones, what can you do? One approach is to replace those empty values with a predefined non-empty value. Then the loop will run the proper number of times, and the logic within the loop will know which field number corresponds with each value (or with your standard replacement for an empty value).
<cfset Record=REReplace(#Record#,"\*{2}","*{EMPTY}*","ALL")>
This almost does the job. It expands the original record into this:
G72*41*02*{EMPTY}**-2.180*36.000*CA
So what happened? You specified that you wanted all occurences of two consecutive delimiters to be replaced with *{EMPTY}*, but it only picked up the first occurrence. The problem seems to be that characters used in the first replacment are ignored as the search continues. The scan doesn't see the first of two asterisks following the first empty value; because, that asterisk is part of what the first occurrence was replaced with. If you experiment on your own, you'll see that it picks up every OTHER pair. A stream of six asterisks would be converted to this:
*{EMPTY}**{EMPTY}**{EMPTY}*
So now what? It turns out that there's a simple fix: just do the replacement twice. With an input form and index loops, here's a demonstration parser for records of variable-length delimited fields. Thanks to Don Gilliland for pointing out that some browsers need the textarea field to have a wrap="physical" attribute for proper display within the input field. chr(10) is a linefeed, a reasonable character to look for in order to recognize the end of a line of text.
<cfif not isDefined("form.Doc")>
<cfset Doc="">
</cfif>
<cfform name="parser" action="parseDoc.cfm" method="post">
Paste an X12 document here, and press "Parse"<br>
<cfoutput><pre>
<textarea name="Doc" wrap="physical" rows="8" cols="50">#trim(Doc)#</textarea>
<br></pre></cfoutput>
<input type="submit" name="go" Value="Parse">
<p>
<hr>
</cfform>
<cfif len(trim(Doc)) gt 0>
<cfloop index="Record" list="#trim(Doc)#"
delimiters="#chr(10)#">
<cfset Demo="">
<cfif len(trim(Record)) gt 0>
<cfset FieldNo=-1>
<cfset Record=ReReplace(#Record#,"\*{2}","*{EMPTY}*","all")>
<cfset Record=ReReplace(#Record#,"\*{2}","*{EMPTY}*","all")>
<!--- (yes, the above Record must be used twice) --->
<cfloop index="Field" list="#Record#" delimiters="*">
<cfset FieldNo=#FieldNo#+1>
<cfif FieldNo is 0>
<!--- FieldNo 0 is the label for the record type --->
<cfset RecType=trim(Field)>
<cfif len(RecType)>
<!--- In real life, you'd do a lookup here to get the record layout --->
<cfset Demo=Demo&"RecType: #Field#">
<cfelse>
<cfset Demo=Demo&"RecType: {EMPTY}">
</cfif>
<cfelse>
<!--- Subsequent fields are really fields --->
<!--- In real life, you'd look up this field number for this record type --->
<cfset Field=trim(Field)>
<cfif len(Field)>
<cfset Demo=Demo&", FieldNo_#FieldNo#: #Field#">
<cfelse>
<cfset Demo=Demo&", FieldNo_#FieldNo#: {EMPTY}">
</cfif>
</cfif>
</cfloop>
</cfif>
<cfoutput>#Demo#</cfoutput><br>
</cfloop>
</cfif>
GS*GP*802896274*4058407200*20001108*1312*47*T*004010UCS ST*880*0060 G01*20000916*6076*20000927*426151-ME G61*IC*MARTY LADNER G62*11*20000927 G27*M****DEF G23*01*3*20000916*01*2*20000926*10*20000927*11*18286*896024*914310 G25*PC*04 N1*BT*PARENTSTORE*9*006943773ME00 N3*236 MONTGOMERY HWY N4*BIRMINGHAM*AL*35216*USA N1*ST*LOCALSTORE*9*006943773ME00 N3*2548 REDSTONE RD SW N4*HUNTSVILLE*AL*35803*USA N1*RE*LIGERS*9*8028962740001 N3*3410 MCGEHEE RD N4*MONTGOMERY*AL*36111*USA G17*60*CA*26.09*074816211179******60*CA G69*CHOCOLATE MUFFINS G20*12*16*OZ G17*330*CA*23.34*074816261452******330*CA G69*BRAN MUFFINS G20*12*11*OZ G17*30*CA*26.09*074816291547******30*CA G69*PUMPKIN MUFFINS G20*12*18*OZ G31*420*CA G33*914310 SE*28*0060 GE*1*47