We are currently developing coding guidelines and I wanted make URLs were valid and consistently formatted before they were inserted into our SQL tables. Here is a script to standardize the URLs you receive through form data.
This is my first version of a URL cleaning script. It takes a variable that is assumed to be a URL. Then, it takes that string and checks to see if it does not contain http://. The script attempts to load that page and if the header code of the page is 200 OK, which means it was a good load, the variable is trimmed and lower cased. Also, the script checks for three different versions of the URL.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | <cfparam name="FORM.CompanyURL" default="asonbartholme.com"> <cfset dirtyURL = trim(lcase(FORM.CompanyURL))> Start: <cfoutput>#dirtyURL#</cfoutput><br/> <cfif dirtyURL DOES NOT CONTAIN "http://"> <cfset tempURL = "http://" & dirtyURL> Temp1: <cfoutput>#tempURL#</cfoutput><br/> <cfhttp url="#tempURL#" method="GET" timeout="15"></cfhttp> <cfif cfhttp.statusCode EQ "200 OK"> <cfset cleanURL = TempURL> <cfelse> <cfif dirtyURL DOES NOT CONTAIN "www."> <cfset tempURL = "http://www." & dirtyURL> Temp2: <cfoutput>#tempURL#</cfoutput><br/> <cfhttp url="#tempURL#" method="GET" timeout="15"></cfhttp> <cfif cfhttp.statusCode EQ "200 OK"> <cfset cleanURL = TempURL> <cfelse> <cfif dirtyURL DOES NOT CONTAIN "http://www."> <cfset tempURL = "http://www." & dirtyURL> Temp3: <cfoutput>#tempURL#</cfoutput><br/> <cfhttp url="#tempURL#" method="GET" timeout="15"></cfhttp> <cfif cfhttp.statusCode EQ "200 OK"> <cfset cleanURL = TempURL> </cfif> </cfif> </cfif> </cfif> </cfif> </cfif> <cfif Not IsDefined('cleanURL')> <cfset cleanURL = "INVALID: " & dirtyURL> </cfif> <cfoutput>Finish: #cleanURL#</cfoutput> |
I added the INVALID: prepend to the bad URLs to represent that the URL was not successfully connected. One could run another script on the table to resolve those erroneous URLs.
Like I said, this was my first attempt to standardize the URLs that we receive through form data. This code hasn’t been tested enough to deem it bulletproof. I don’t claim to be a code ninja, so if you know of a better method or would like to tweak this version, please let me know.







