Wednesday, January 4, 2012

RSS feed(s) to MySQL database - a script using Simplepie and PHP in Codeigniter

A few weeks ago I was looking for a PHP script that would take RSS feeds and insert them into a MySQL database so I could manipulate them in my environment. There were a few examples here and there that were useful, but not one big example that suited my needs. Here's what I ended up doing.

First I needed to choose a RSS parser. A quick Google search revealed Magpie and Simplepie. I compared the two and ultimately chose Simplepie since it's being actively developed and the documentation seemed thorough and accessible.

After downloading the latest version (1.2.1 at the time of writing), I unzipped it and grabbed the "simplepie.inc" file. This went in the root of my /CodeIgniter/scripts folder along with a new PHP file called "rss.php".

The Code

require_once('simplepie.inc'); 

This allows the script to take advantage of Simplepie. Next:

// Create a new instance of Simplepie.
$feed = new SimplePie();


// There are two options to parse RSS feeds depending on your needs.
// The first is a single url:
$feed->set_feed_url('http://www.yoururl.com/rss.xml');
// The second is an array of feeds, if you're parsing more than one:
$feed->set_feed_url(array(
'http://www.yoururl.com/rss.xml',
'http://www.yoururl.com/rss.xml'
));

// Only use one of the above ways. Comment out or delete the other.

// Set a folder where the cache can reside.
$feed->set_cache_location($_SERVER['DOCUMENT_ROOT'] . '/CodeIgniter/application/cache');

// If you're on Dreamhost like I am, you'll need to modify the cache location to something like this:
$feed->set_cache_location('/home/[UserName]/myFolder/cache');
// This will ensure the output from the RSS feed matches the MySQL encoding, otherwise you end up with characters like this: รข€
$feed->set_output_encoding('ISO-8859-1');
// Initialize Simplepie.
$feed->init();

// This makes sure that the content is sent to the browser as text/html and the UTF-8 character set (since we didn't change it).
$feed->handle_content_type();


Next we need to establish a database connection. For testing purposes, I ran this script on my local machine using XAMPP.

// Make sure to put in the proper server, username, and password information here.
$con = mysql_connect("localhost", "mysql_username", "mysql_password");
// Select the database you want to use.
mysql_select_db("myDB", $con);

// Then check if the connection failed. If it did, die with the error message.
if (!$con)
{ die('Could not connect: ' . mysql_error()); }

Now it's time to loop through the parsed items and load them to the database.

foreach ($feed->get_items() as $item)
{
// Here are some of the various items you can pull from an RSS feed:
        $permalink = $item->get_permalink();
$title = $item->get_title();
$content = $item->get_description();
$date = $item->get_date('Y-m-j g:i:s');

        // This next part will check for duplicated items by hashing the content and comparing it to the hashes in the database. This part is optional, but recommended.
        $content_hash = md5($content);
$result = mysql_query("SELECT * FROM my_posts WHERE content_hash = '" . $content_hash . "'");
$num_rows = mysql_num_rows($result);

if ($num_rows > 0) { }
        // If this item's hash doesn't match any in the database
else
{
// This part will check if the date has been set. Sometimes feeds don't post the date so I found this to be necessary. If it has a date, then INSERT everything, if not, set a default date in MySQL and don't INSERT the date.
                if (isset($date))
{
mysql_query("INSERT INTO my_posts (date, content, content_hash, url, title) VALUES ('" . $date . "', '" . $content . "', '" . $content_hash . "', '" . $permalink . "', '" . $title . "')");
}
else
{
mysql_query("INSERT INTO my_posts (content, content_hash, url, title) VALUES ('" . $content . "', '" . $content_hash . "', '" . $permalink . "', '" . $title . "')");
}
}
}

echo "SCRIPT COMPLETE.";

// Finally, remember to close the connection.
mysql_close($con);

That's it. I can run this, parse feeds, grab the data and INSERT it into the MySQL database. There's probably a more elegant and simple way to do it, but this worked for my needs. From here you can access these posts and manipulate them on your site however you please. I added the ability to comment, rate them, modify and delete, etc.

5 comments:

  1. Hi Sam,

    Thanks for the post.

    What SQL did you use to create the table you inserted your values into? I'm having trouble with getting mine to work and I'm wondering if my table is set up incorrectly.

    Kind regards,
    C

    ReplyDelete
  2. i just modified the code to use it with simplepie 1.3... only need to replace "require_once('autoloader.php');" at the beginning and set the cache locally (not in mysql) and it works super fine!

    thank you

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. Hi Sam, How heavy is this on the CPU resources? I tried a feed to post WP plugin and killed my hosting..

    ReplyDelete