Take one part regular expressions, one part ColdFusion and one part Fark.com and you can see all the pics posted on the story comments minus all the comments.
I have been trying to hone my RegEx skills to use for good and a little bit of evil, so I decided to challenge myself. I read Fark.com almost daily and love the pictures that people post along with comments about the particular story. During my lunch break, I only have a limited amount of time to browse without a content filter, so I end up skimming the comment pages for the funny/interesting pics.
It would be nice to see the comment pages on Fark without conversation but only the pictures. Here’s what we need to make this happen:
Home Page
- Grab the content from the live page
- Extract the list of categories(tags), stories, comment count and story ID
- Pass story title and ID to picture display page
Pic Page
- Grab the comments from the live page based on the story ID
- Parse out all the
tags within the comments
- Discard all the images with “fark” in the image to not display site related images
- Show only user submitted pictures and a link to get back to the main page
We use ColdFusion cfhttp tag to pull the home page content as seen below:

All that content is held in the CFHTTP.FileContent variable. Then, we use the REMatchNoCase() function to build an array for each story. Within the array, we can parse and loop through the back references based on a complex regular expression we used to scrape the page.

We put those variables into a simple table that we can look over and decided what stories we are interested in with a link to spit out only the pictures from that stories comments.

Onto the picture page. We have a story ID that we use to get the page contents from Fark. We take that and apply two different regular expressions. One is to isolate the user submitted comments, the second extract only the <img> tags.

Once we have the array built, then we simply loop through the array and display the images. I filter out the images that contain “fark” within the tag, because that usually denotes a “Total Fark” member badge.
We place a simple link to get back to the main page, and possibly pass in the story name to the pic page. There is limitless customization that can be done to this script. I was more concerned about getting the correct data to display.
Download the entire script. Here is the working pages which include the complex RegEx that was used to get the desired content.
Disclaimer time. Fark.com is a great site that loads of entertaining content that keeps me coming back. The tutorial is more about applying the principles of regular expressions with ColdFusion to get the data you want than exploiting Fark. The same principles can be used on other master/detail sites.
I would have had all the code inline, but my plugin to do was acting screwy with the code. Any suggestions for displaying source code in WordPress?


February 8th, 2009 at 11:10 pm
Take a look at SyntaxHighlighter with a Coldfusion brush for code viewing on page: http://code.google.com/p/syntaxhighlighter/
February 9th, 2009 at 8:35 pm
Haha, this is awesome. It’s like optimized time wasting.
=Ryan
ryan@adobe.com
February 9th, 2009 at 9:45 pm
Indeed it is, Ryan. The cfhttp tag opens a world of productivity and shenanigans. I’m thinking of a blog post series called, “Fun with cfhttp”.
Thanks for stopping by!
October 16th, 2009 at 11:07 am
This is very amazing, ColdFusion it completely amazing!
October 30th, 2009 at 7:05 am
hi,
u have got a very powerful regular expression…
good game..