My First Real Ruby Program

I have been trying out Ruby for the past week or so. I finally had a need to create a quick script and I decided to use Ruby.

I like the comic strip Dilbert. In my Google Personal Homepage, I used Google’s internal link to add Dilbert to my feeds. However, I really hate the site. The advertisements drive me nuts. So, I decided to see what I can do to create my own feed.

So, my first real Ruby script basically:

  • Prepares a hash table, with dates (last 7 days) as keys and image URLs as values.
  • Connects to dilbert.com and get each day’s page.
  • Screen-scrapes image URLs for the comic, stores them in the hash table.
  • Use the hash table to generate an RSS XML document.
  • Connect to a web store via FTP and upload the RSS.

It’s not the greatest script, but it works:

#!/usr/bin/ruby

#
# dilbert_raw.rb
#
# Ralph Allan Rice
# ralph.rice @ gmail.com
# June 13, 2006
#
# Generates a raw RSS feed where entries point directly to the raw 'dilbert' strip.
#
# This source code is public domain.  Do what you want with it, but there is no warranty
# and I cannot be held responsible for what you do with this code.
#
#

require 'net/http'
require 'net/ftp'
require 'date'

# set scrape parameters
title = "Dilbert (Raw)"
description="Raw comic strip feed for dilbert.com"
webmaster="[Your Email Address]"
filename="dilbert.rss"
scrape_domain = "www.dilbert.com"
url_format = "/comics/dilbert/archive/dilbert-%Y%m%d.html"

# set up ftp parameters
ftp_domain="[your ftp server]"
ftp_directory="[your remote directory]"
username="[your ftp username]"

# Yeah, Yeah  encoding passwords in script is bad.
password="[your ftp password]"

# set up timing factors
now  = Time.now()
landing_map = Hash::new()
image_map = Hash::new()

# Create an array of the last five dates
dates = [0, 1, 2, 3, 4, 5, 6].collect { |x| now - (x * 86400) }

# Format a URL to scrape for each date.
dates.each { |x| landing_map[x]=x.strftime(url_format) }

h = Net::HTTP.new(scrape_domain, 80)

# Scrape the web, put results in new hash.
landing_map.each_pair do |key, value|
  resp, data = h.get(value, nil)
  if resp.message=="OK"
    found = data.scan(/\/comics\/dilbert\/archive\/images\/dilbert[0-9]+\..../).uniq
    if found.length > 0
      image_map[key] = "http://" + scrape_domain + found[0]
    end
  end
  # Don't take down the site, we can wait a few cycles.
  sleep(1)

end 

# Now create a rough RSS file.

File.open(filename, "w") do |rss|
  rss.print '<?xml version="1.0" ?>'
  rss.print '<rss version="2.0"><channel>'
  rss.print '<title>'+ title + '</title>'
  rss.print '<description>' + description + '</description>'
  rss.print '<pubDate>'+ now.strftime("%a, %d %b %Y %H:%M:%S %z") + '</pubDate>'
  rss.print '<webMaster>' + webmaster + '</webMaster>'

  # Print items
  image_map.keys.sort.each do | keydate|
    rss.print '<item><title>' + keydate.strftime("%A, %B %d, %Y") + '</title>'
    rss.print '<link>' + image_map[keydate] + '</link>'
    rss.print '<pubDate>' + keydate.strftime("%a, %d %b %Y %H:%M:%S %z")+ '</pubDate></item>'
  end

  rss.print '</channel></rss>'
end

# Now FTP the file up to the site

Net::FTP.open(ftp_domain) do |ftp|
  ftp.login(username, password)
  ftp.chdir(ftp_directory)
  ftp.puttextfile(filename)

end

A couple of the things you may notice about this code:

  • My use of blocks was not by accident. It is pretty straight-forward and acts alot like lambdas in Lisp. I like it.
  • I did not use any XML libraries to generate the RSS feed. I just like plunking down a XML stream by hand.

Not bad for my first attempt at Ruby.



%d bloggers like this: