scraped fark home pageTake one part regular expressions, one part ColdFusion and one part Fark.com and you can see all the pics posted on the story comments minus all the comments.

I have been trying to hone my RegEx skills to use for good and a little bit of evil, so I decided to challenge myself. I read Fark.com almost daily and love the pictures that people post along with comments about the particular story. During my lunch break, I only have a limited amount of time to browse without a content filter, so I end up skimming the comment pages for the funny/interesting pics.

It would be nice to see the comment pages on Fark without conversation but only the pictures. Here’s what we need to make this happen:

Home Page

  • Grab the content from the live page
  • Extract the list of categories(tags), stories, comment count and story ID
  • Pass story title and ID to picture display page

Pic Page

  • Grab the comments from the live page based on the story ID
  • Parse out all the tags within the comments
  • Discard all the images with “fark” in the image to not display site related images
  • Show only user submitted pictures and a link to get back to the main page

We use ColdFusion cfhttp tag to pull the home page content as seen below:

main page cfhttp

All that content is held in the CFHTTP.FileContent variable. Then, we use the REMatchNoCase() function to build an array for each story. Within the array, we can parse and loop through the back references based on a complex regular expression we used to scrape the page.

main page cfscript

We put those variables into a simple table that we can look over and decided what stories we are interested in with a link to spit out only the pictures from that stories comments.

scraped fark home page

Onto the picture page. We have a story ID that we use to get the page contents from Fark. We take that and apply two different regular expressions. One is to isolate the user submitted comments, the second extract only the <img> tags.

pic page parsing

Once we have the array built, then we simply loop through the array and display the images. I filter out the images that contain “fark” within the tag, because that usually denotes a “Total Fark” member badge.

We place a simple link to get back to the main page, and possibly pass in the story name to the pic page. There is limitless customization that can be done to this script. I was more concerned about getting the correct data to display.

Download the entire script. Here is the working pages which include the complex RegEx that was used to get the desired content.

Disclaimer time. Fark.com is a great site that loads of entertaining content that keeps me coming back. The tutorial is more about applying the principles of regular expressions with ColdFusion to get the data you want than exploiting Fark. The same principles can be used on other master/detail sites.

I would have had all the code inline, but my plugin to do was acting screwy with the code. Any suggestions for displaying source code in WordPress?


6 Responses to “How to View Only Pictures on Fark.com with ColdFusion”

  1. Paul Says:

    Take a look at SyntaxHighlighter with a Coldfusion brush for code viewing on page: http://code.google.com/p/syntaxhighlighter/

  2. Ryan Stewart Says:

    Haha, this is awesome. It’s like optimized time wasting.

    =Ryan
    ryan@adobe.com

  3. Jason Bartholme Says:

    Indeed it is, Ryan. The cfhttp tag opens a world of productivity and shenanigans. I’m thinking of a blog post series called, “Fun with cfhttp”.

    Thanks for stopping by!

  4. How Says:

    This is very amazing, ColdFusion it completely amazing!

  5. sooraj Says:

    hi,
    u have got a very powerful regular expression…
    good game..

  6. Seo Fleet Says:

    Hi Jason
    you did a very nice job. your blog is very nice. keep up your best.
    Thanks for this info…

Leave a Reply