Building the spreadsheet can be a mindless typing exercise or a cut-and-paste task that's prone to error. However, if you have the Data Definition Language (DDL) script used to create the database and you have a tool that will make the spreadsheet for you from the create statements in the DDL, then you can skip the manual labor and go right to the analysis. Here's one way to do this.
CREATE TABLE [dbo].[tblLogin] ( [LoginKey] [int] IDENTITY (1, 3) NOT NULL , [Login] [varchar] (12) NOT NULL , [Passwd] [varchar] (15) NOT NULL , [RoleKey] [int] NOT NULL , [AcctID] [int] NOT NULL , [DateChanged] [datetime] NOT NULL ) ON [PRIMARY] GO CREATE TABLE [dbo].[tblModuleAccess] ( [AcctID] [varchar] (5) NOT NULL , [ModuleID] [int] NOT NULL IDENTITY (1, 4), [InvoiceTypeID] [int] NOT NULL , [ProcessType] [char] (1) NOT NULL , [AccessStatus] [int] NULL , [IPAddress] [varchar] (40) NOT NULL , [RequestDate] [datetime] NOT NULL ) ON [PRIMARY] GO CREATE TABLE [dbo].[tblItemData] ( [Item_ID] [int] NOT NULL , [Invoice_ID] [int] NULL , [Qty] [numeric](12, 2) NOT NULL , [Cost] [numeric](18, 6) NOT NULL , [CLIN] [varchar] (6) NOT NULL , [Uom_Code] [varchar] (2) NULL , [frt_cost] [numeric](15, 2) NULL , [Description] [varchar] (80) NOT NULL ) ON [PRIMARY] GO
<form name="eat" action="parseDDL2.cfm" method="post"> Paste DDL here...<br> <textarea cols="80" rows="15" name="Mouth"></textarea> <input type="submit" name="go" value="Run"> </form>
The following code checks for input, prints a header, and sets the variable Front to 1 to get started. If there's no input, it stops.
Internet Explorer will read a table directly from a Web page, but Netscape won't. Therefore, when printing the header, if the browser is not IE, the code creates a comma-separated list instead of starting a table. Because the purpose of creating this table is to let you copy and paste data into a spreadsheet, the table border is deliberately left at zero (the default) so that the spreadsheet won't wind up a mixture of bordered (populated) and non-bordered (empty) cells.
When looping through tables, find the Front of the first "create table" statement. Starting from that point, look for "go" preceded by chr(10) (linefeed), chr(13) (carriage return), or a space. That's the Back of the statement. The inner engine will (later) parse the statement. The outer engine will repeat the process until no more statement are seen. Front will become zero; the loop will end.
<!--- Stop if no input --->
<cfparam name="form.mouth" default="">
<cfif len(form.mouth) is 0>
Need input.
<cfabort>
</cfif>
<!--- Start table; print header --->
<cfif findNoCase("MSIE",cgi.http_user_agent)>
Copy and paste directly into a spreadsheet.
<p>
<table>
<tr><td>Table</td><td>Element</td>
<td>Type</td><td>Len</td>
<td>ID</td><td>Null</td></tr>
<cfelse>
Copy to a word processor, convert commas to tabs,<br>
and paste the result into a spreadsheet.
<p>
Table,Element,Type,Len,ID,Null<br>
</cfif>
<!--- Loop tables --->
<cfset Front=1>
<cfloop condition="Front neq 0">
<cfset Front=findNoCase("create table",form.mouth,Front)>
<cfif Front eq 0> <!--- No more tables --->
<cfbreak>
</cfif>
<cfset Back=
reFindNoCase("[#chr(10)##chr(13)# ]go",form.mouth,Front)>
<cfif Back lt 1>
A create statement is bad.
<cfabort>
</cfif>
<cfset Work=form.mouth>
<!--- Get table name --->
<cfset Table=mid(Work,Front,find("(",Work,Front)-Front)>
<cfset Table=reReplaceNoCase(Table,"create table","")>
<cfset Table=replace(Table,"].[",".")>
<cfset Table=replace(Table,"[","")>
<cfset Table=trim(replace(Table,"]",""))>
The remainder of the statement is a list of fields with their corresponding information often followed by "ON [PRIMARY]". Its first character is the first "(" after the beginning (Front) of the create statement; its length is the distance between that point and the end (Back) of the create statement. Assign this text to a variable you'll treat as a list long enough to convert it to an array.
The list contains more information about the identity condition than you really need for a quick analysis. You don't care whan number the automatic numbering starts with or the increment step size; so, replace statements boil down the identity to a simple placeholder: !ID!. A discussion of regular expression is in order here. The regular expression replace no case statement says to replace the following with "!ID!" if it's found: "identity" (regardless of case; because, this replacement function ignores case); one or more spaces; a left parenthesis (escaped with a backslash due to the special role of a parenthesis); one or more digits; zero or more spaces; a comma; zero or more spaces; one or more digits; and a right parenthesis (escaped with a backslash due to the special role of a parenthesis).
That leaves NOT NULL as the only useful two-word phrase. The code uses replace statements to merge the two words for simplicity.
The "on primary" statement is not needed; so, remove the following if found: "on" (regardless of case); one or more spaces; a left square bracket (escaped); "primary" (regardless of case); and a right square bracket (escaped).
<!--- Format field list --->
<cfset FieldList=
mid(Work,find("(",Work,Front),Back-find("(",Work,Front))>
<cfset FieldList=
reReplaceNoCase(FieldList,
"identity[ ]+\([0-9]+[ ]*,[ ]*[0-9]+\)","!ID!")>
<cfset FieldList=
reReplaceNoCase(FieldList,"not null","NOT_NULL","all")>
<cfset FieldList=
reReplaceNoCase(FieldList,"on[ ]+\[primary\]","")>
Doing this much handles lets most DDL be handled easily. However, the numeric and decimal types can specify how many points should be used to the right of the decimal. Instead of providing a single length attribute, these types provide two numbers separated by a comma. Because the numbers are separated by a comma, the line containing this kind of element type is split by this comma into two separate entries in the list so the first half of the logical row ends at the first number and the second half becomes a new row starting with the second number. We need to paste the two rows back together to become a single row in the list. To do this, we make a new copy of the list, one logical row at a time. If the beginning of a row begins with "0" (zero) or can be interpreted as a non-zero value, then it is pasted to the end of the previous row with a space (so the two halves of the logical row are treated as a single row again). Otherwise, the rows are separated by a comma (because they really are separate logical rows).
<!--- Rebuild field list ---> <cfset OKList=""> <cfloop list="#FieldList#" index="Row"> <cfset Row=trim(Row)> <cfif val(Row) or (left(Row,1) is "0")> <cfset OKList=OKList&" "&Row> <cfelse> <cfset OKList=OKList&","&Row> </cfif> </cfloop>
Now it's time to obtain the attributes for each field by looping through the fields. Recall that each group of attributes is separated from the others by a comma; so, the default delimiter (a comma) is OK.
Within each group, the attributes are separated by a space. Because the remaining square brackets and parentheses are now just "noise", including them as delimiters when converting the list to an array removes them as well.
The first "attribute" is probably empty; because, one of the delimiters precedes it. Therefore, prepare to throw away the first attribute if it's empty. Multiple delimiters are treated as one; so, we only have to do this once.
<!--- Parse field list ---> <cfloop list="#OKList#" index="ItemList"> <cfset ItemArray=listToArray(ItemList," []()")> <cfset Begin=1> <cfif not len(trim(ItemArray[Begin]))> <cfset Begin=Begin+1> </cfif>
The first real item is always the field name; so, assign it. The second item is the type; assign that also. "Begin" is used as a pointer to examine the next item when appropriate.
<cfset FieldName=ItemArray[Begin]> <cfset Begin=Begin+1> <cfset FieldType=ItemArray[Begin]> <cfset Begin=Begin+1>
Because you can't guarantee the sequence of the remaining non-numeric items, loop through them with useful tests until no more items remain in the array. Treat the first numeric item as the length. Paste a subsequent numeric item to it with a decimal point; it indicates the number of digits in the overall length that fall to the right of the decimal point. If the item begins with "!", it's an ID indicator. If it begins with "N", it refers to its null or not-null condition. Note that the "is" operator doesn't care if the "N" is upper case or lower case; it will match it either way.
<cfset FirstPass=1>
<cfloop condition="Begin le arrayLen(ItemArray)">
<cfif isNumeric(ItemArray[Begin])>
<cfif FirstPass>
<cfset FieldLen=ItemArray[Begin]>
<cfset FirstPass=0>
<cfelse>
<cfset FieldLen=FieldLen&"."&ItemArray[Begin]>
</cfif>
<cfelseif left(ItemArray[Begin],1) is "!">
<cfset FieldID="ID">
<cfelseif left(trim(ItemArray[Begin]),1) is "N">
<cfset FieldNull=trim(ItemArray[Begin])>
</cfif>
<cfset Begin=Begin+1>
</cfloop>
After the loop, initialize all unassigned attributes. and print the row. If the browser is not IE, print comma-separated values instead of a table row.
<cfparam name="FieldLen" default="">
<cfparam name="FieldID" default="">
<cfparam name="FieldNull" default="">
<!--- Print row --->
<cfif findNoCase("MSIE",cgi.http_user_agent)>
<cfoutput><tr>
<td>#Table#</td><td>#FieldName#</td>
<td>#FieldType#</td>
<td>#FieldLen#</td><td>#FieldID#</td>
<td>#FieldNull#</td></tr></cfoutput>
<cfelse>
<cfoutput>
#Table#,#FieldName#,#FieldType#,#FieldLen#,
#FieldID#,#FieldNull#<br>
</cfoutput>
</cfif>
If you don't clear the optional variables between passes, their values will persist, reporting incorrect attribute values for subsequent fields. So, clear the variables. Close the field loop to work with the next field. Move Front ahead one place so the next hunt won't find this same create statement over again, and close the statement loop to get the next statement. Close the table if IE was used. (A table wasn't used for Netscape.)
<!--- Clear optional variables; repeat until done --->
<cfset FieldLen="">
<cfset FieldID="">
<cfset FieldNull="">
</cfloop>
<cfset Front=Front+1>
</cfloop>
<cfif findNoCase("MSIE",cgi.http_user_agent)>
</table>
</cfif>
If you're using Netscape, the results will be displayed in a comma-separated format. Copy and paste the page into a word processor and convert commas to tabs. If you're using Word, the replace should look for "," and replace it with "^t", the meta-symbol for a tab in Word. Once the commas have been converted to tabs, copy and paste the page into an open spreadsheet; it will fall right in.
Now sort the spreadsheet by element. Since many of you will experience this tip in plain ASCII text, here's the equivalent of the sorted spreadsheet as a comma-separated list. Notice that AcctID has two different type definitions (int and varchar); this should get fixed. IPAddress is 40 characters; this seems a bit long. Copy this into a spreadsheet (after converting commas to tabs) to get the full effect.
Table,Element,Type,Len,ID,Null dbo.tblModuleAccess,AccessStatus,int,,,NULL dbo.tblLogin,AcctID,int,,,NOT_NULL dbo.tblModuleAccess,AcctID,varchar,5,,NOT_NULL dbo.tblItemData,CLIN,varchar,6,,NOT_NULL dbo.tblItemData,Cost,numeric,18.6,,NOT_NULL dbo.tblLogin,DateChanged,datetime,,,NOT_NULL dbo.tblItemData,Description,varchar,80,,NOT_NULL dbo.tblItemData,frt_cost,numeric,15.2,,NULL dbo.tblItemData,Invoice_ID,int,,,NULL dbo.tblModuleAccess,InvoiceTypeID,int,,,NOT_NULL dbo.tblModuleAccess,IPAddress,varchar,40,,NOT_NULL dbo.tblItemData,Item_ID,int,,,NOT_NULL dbo.tblLogin,Login,varchar,12,,NOT_NULL dbo.tblLogin,LoginKey,int,,ID,NOT_NULL dbo.tblModuleAccess,ModuleID,int,,ID,NOT_NULL dbo.tblLogin,Passwd,varchar,15,,NOT_NULL dbo.tblModuleAccess,ProcessType,char,1,,NOT_NULL dbo.tblItemData,Qty,numeric,12.2,,NOT_NULL dbo.tblModuleAccess,RequestDate,datetime,,,NOT_NULL dbo.tblLogin,RoleKey,int,,,NOT_NULL dbo.tblItemData,Uom_Code,varchar,2,,NULL
Imagine being able to extract data for 40 tables and draw useful conclusions in five minutes. Experiment. Change the parser to handle DDL for other databases, and tell us what you've learned. =Marty=