We are currently developing coding guidelines and I wanted make URLs were valid and consistently formatted before they were inserted into our SQL tables. Here is a script to standardize the URLs you receive through form data.

This is my first version of a URL cleaning script. It takes a variable that is assumed to be a URL. Then, it takes that string and checks to see if it does not contain http://. The script attempts to load that page and if the header code of the page is 200 OK, which means it was a good load, the variable is trimmed and lower cased. Also, the script checks for three different versions of the URL.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
<cfparam name="FORM.CompanyURL" default="asonbartholme.com">
<cfset dirtyURL = trim(lcase(FORM.CompanyURL))>
Start: <cfoutput>#dirtyURL#</cfoutput><br/>
<cfif dirtyURL DOES NOT CONTAIN "http://">
    <cfset tempURL = "http://" & dirtyURL&gt;
    Temp1: <cfoutput>#tempURL#</cfoutput><br/>
    <cfhttp url="#tempURL#" method="GET" timeout="15"></cfhttp>
    <cfif cfhttp.statusCode EQ "200 OK">
        <cfset cleanURL = TempURL&gt;
    &lt;cfelse&gt;
        <cfif dirtyURL DOES NOT CONTAIN "www.">
            <cfset tempURL = "http://www." & dirtyURL&gt;
            Temp2: <cfoutput>#tempURL#</cfoutput><br/>
            <cfhttp url="#tempURL#" method="GET" timeout="15"></cfhttp>
            <cfif cfhttp.statusCode EQ "200 OK">
                <cfset cleanURL = TempURL&gt;
            &lt;cfelse&gt;
                <cfif dirtyURL DOES NOT CONTAIN "http://www.">
                    <cfset tempURL = "http://www." & dirtyURL&gt;
                    Temp3: <cfoutput>#tempURL#</cfoutput><br/>
                    <cfhttp url="#tempURL#" method="GET" timeout="15"></cfhttp>
                    <cfif cfhttp.statusCode EQ "200 OK">
                        <cfset cleanURL = TempURL&gt;
                    </cfif>
                </cfif>
            </cfif>
        </cfif>
    </cfif>
</cfif>
<cfif Not IsDefined('cleanURL')>
<cfset cleanURL = "INVALID: " & dirtyURL&gt;
</cfif>
<cfoutput>Finish: #cleanURL#</cfoutput>

I added the INVALID: prepend to the bad URLs to represent that the URL was not successfully connected. One could run another script on the table to resolve those erroneous URLs.

Like I said, this was my first attempt to standardize the URLs that we receive through form data. This code hasn’t been tested enough to deem it bulletproof. I don’t claim to be a code ninja, so if you know of a better method or would like to tweak this version, please let me know.

Leave a Reply


Warning: stristr() [function.stristr]: Empty delimiter. in /nfs/c02/h04/mnt/40765/domains/jasonbartholme.com/html/wp-content/plugins/wassup/wassup.php on line 2093