ColdFusion In Context: Analyze a Directory Tree

Suppose you've been working on an project for a while and dead pages have begun to litter your application. You could interatively delete one page after another until the application stops working, but that would take a lot of time and might make you lose good code. Or, you could use ColdFusion to help you. Along the way, you'll use put cfdirectory, cffile, and findNoCase() through their paces.

Here's a technique that lists the directory tree to a file and the parses the files in those directories to look for page references. For this example, put the code to build a tree into "dirtree.cfm" and the code to parse page references into "dirlink.cfm".

Get the Starting Point

To walk a directory tree, you need to know where to start. This code determines where your script is and lets the you add a subdirectory to the URL if the subdirectory is to be the starting point. This way, you don't have write the entire directory path to the URL to get started. (After all, if you were sure about all the details, you might not need this tip [grin].)

If the URL variable is empty, the code should tell you what to do. If a dot is used as the directory name, it should be converted to an empty string so the script will start from the directory in which it finds itself.

<cfset scriptdir=getDirectoryFromPath(#cgi.cf_template_path#)>
<cfparam name=url.dir default="">
<cfset dirname=#url.dir#>
<cfif len(dirname) lt 1>
  add ?dir=mydirtree to the URL (where mydirtree... is what you want to check)
  <cfabort>
</cfif>
<cfif dirname is ".">
  <cfset dirname="">
</cfif>

Begin an Array to Hold Directory Names

To keep track of the directory tree listing and start at the desired starting point, create an array to hold the directory names and then add the name of the directory you want the code to search.

<cfset matrix=arrayNew(1)>
<cfset arrayAppend(matrix,"#dirname#")>

Walk and Write the Directory Tree

Set a pointer to the first array entry and begin a loop that will continue until the pointer passes the last array entry.

<cfset item=1>
<cfloop condition="(item le arrayLen(matrix))">

List the directory being pointed to, and add its directories to the array, ignoring the . and .. directories. Add a backslash between the parent directory path and this subdirectory. (This assumes you're in Windows; use a forward slash for UNIX/LINUX.)

<cfset subdir="#matrix[item]#">
<cfset realdir="#scriptdir##subdir#">
<cfdirectory action="list" directory="#realdir#" name="fileList">
<cfloop query="fileList">
<cfif (#fileList.type# is "Dir")>
  <cfif (#fileList.name# is not ".") and (#fileList.name# is not "..")>
    <cfset arrayAppend(matrix,"#subdir#"&"\"&"#fileList.name#")>
    <cfoutput>#subdir#\#fileList.name#</cfoutput><br>
  </cfif>
</cfif>
</cfloop>

Add 1 to the pointer and end the master loop.

<cfset item=#item#+1>
</cfloop>
<cfset outline=arrayToList(matrix)>
<cfset outfile="#scriptdir#"&"dirtree.txt">
<cffile action="write" file="#outfile#" output="#outline#">

Prepare to Use the Directory Names

Up to this point, all this code has been in dirtree.cfm?dir=maze. Put the remaining code into dirlink.cfm?dir=dirtree.txt. This script will read each file in the directory tree and display each reference to another page (with the filename and line number of the reference).

You'll need to set some parameters in order to read the file of directory names. Tell the script what a newline is; so, it will know when lines in the files it reads are ending. (The ASCII value for newline or linefeed is 10.) Tell it the directory the script is in, and assume the file of directory names is in this directory. Supply the name of the file as a variable ("dirlist") so that the script will be easy to modify. Set up a table for the output to make it easy to copy the output of this script from the screen into an Excel spreadsheet for further use.

<cfset newline=#chr(10)#>
<cfset scriptdir=getDirectoryFromPath(#cgi.cf_template_path#)>
<cfset listname="dirtree.txt">
<cfset listname="#scriptdir#"&"#listname#">
<cffile action="read" file="#listname#" variable="dirlist">
<table border=0>
<tr><th>DIRECTORY</th><th>FILE</th><th>LINE</th>
<th>STATEMENT</th></tr>

Begin Looping Through Directories

You'll want to walk through each directory path named in the file. The delimiter that separates paths in the file is a comma. If the directory path does not already end in a backslash, add one so you'll be able to add filenames to the path later to read the files. (This assumes you're in Windows; make this a forward slash for UNIX/LINUX.) Append the directory to be listed to the directory of the script in order to get the full directory path to be listed.

<cfloop list="#dirlist#" delimiters="," index="dirname">
<cfif right(#dirname#,1) is not "\">
  <cfset dirname="#dirname#"&"\">
</cfif>
<cfset thisdir="#scriptdir#"&"#dirname#">
<cfdirectory action="list" directory="#thisdir#" name="filelist">

Read, Parse, and Display Page References

Loop through the directory to read the useful files it contains. This example reads ColdFusion pages, HTML files, Active Server Pages, and Java Server Pages.

<cfloop query="filelist">
<cfif (findNoCase(".cfm",#filelist.name#)) or
 (findNoCase(".htm",#filelist.name#)) or
 (findNoCase(".asp",#filelist.name#)) or
 (findNoCase(".jsp",#filelist.name#))>
  <cfset thisfile="#thisdir#"&"#filelist.name#">
  <cffile action="read" file="#thisfile#" variable="text">

Start a line counter and parse the file, one line at a time. You've already defined the delimiter that separates lines. If the line has at least one character, look for items of interest. If any of those items is found, replace the greater-than and less-than signs (and their equivalent) in the line with something safer and display the result in the table. This example will display the directory, file, line number, and line contents for each line that contains a link, form action, location, refresh, cfinclude, or cfmodule. The "findNoCase" construct is more robust and useful than the "contains" construct; so, "findNoCase" is used here.

  <cfset lineno=0>
  <cfloop list="#text#" delimiters="#newline#" index="line">
  <cfset work=trim(#line#)>
  <cfset lineno=#lineno#+1>
  <cfif len(#work#) gt 0>
    <cfif (findNoCase("<a",#work#) gt 0) or
 ((findNoCase("<form",#work#) gt 0) and
 (findNoCase(" action",#work#) gt 0)) or
 (findNoCase("location",#work#) gt 0) or
 (findNoCase("refresh",#work#) gt 0) or
 (findNoCase("cfinclude",#work#) gt 0) or
 (findNoCase("cfmodule",#work#) gt 0)>
      <cfset work=REReplace(#work#,"<","_","all")>
      <cfset work=REReplace(#work#,">","~","all")>
      <tr><td><cfoutput>#dirname#</cfoutput>
</td><td><cfoutput>#getFileFromPath(thisfile)#
</cfoutput></td><td><cfoutput>#lineno#
</cfoutput></td><td><cfoutput>#work#
</cfoutput></td></tr>
    </cfif>
  </cfif>
  </cfloop>
</cfif>

Finally, close things up. Finish the loop for reading this directory, finish the loop for reading each directory from the file of directories, and end the table.

</cfloop>
</cfloop>
</table>

Use and Extend the Concept

To use these scripts, put them in the same directory and run them, one after the other. You can re-run the second script as desired; just remember to run the first script when the directory tree changes.

By replacing the items to be searched for in the second script, you can extend this concept to list decision points or ColdFusion variables. You can have multiple scripts use the file created by the first script: a script for each purpose. Happy hunting! =Marty=

[Code for the actual demonstration was changed behind the scenes to block upward directory traversal, and a typographical error, "the the", has been corrected.]