Re: get content rss feed

tamouse mailing lists <tamouse.lists@xxxxxxxxx> · Fri, 4 May 2012 21:33:19 -0500

On Wed, May 2, 2012 at 7:00 AM, Doeke Wartena <clankill3r@xxxxxxxxx> wrote:
> I try to get the content from the following rss feed
> http://www.adafruit.com/blog/feed/
>
> I want to store it in a database in order to use it for a school assignment.
> If i look in my browser to the feed then i see content and description,
> however if i try to get them with php then description is this:
>
> [description] => SimpleXMLElement Object
>        (
>        )
>
>
> and content is gone.
>
>
> ----------
> $db = dbConnect();
>
> $xml =  getFileContents("http://www.adafruit.com/blog/feed/";);
> $xmlTree = new SimpleXMLElement($xml);
>
> for($i = count($xmlTree->channel->item)-1; $i >= 0; $i--) {
> $item = $xmlTree->channel->item[$i];
>  echo "<pre>";
> print_r($item);
> echo "</pre>";
> }
>
> dbClose($db);
>
> ?>
> ----------
>
> this is 1 part of the print_r:
>
> SimpleXMLElement Object
> (
>    [title] => Birth of the ARM: Acorn Archimedes Promo from 1987
>    [link] => http://www.adafruit.com/blog/2012/04/28/birth-of-the-arm-acorn-archimedes-promo-from-1987/
>    [comments] =>
> http://www.adafruit.com/blog/2012/04/28/birth-of-the-arm-acorn-archimedes-promo-from-1987/#comments
>    [pubDate] => Sat, 28 Apr 2012 04:01:35 +0000
>    [category] => Array
>        (
>            [0] => SimpleXMLElement Object
>                (
>                )
>
>            [1] => SimpleXMLElement Object
>                (
>                )
>
>        )
>
>    [guid] => http://www.adafruit.com/blog/?p=30498
>    [description] => SimpleXMLElement Object
>        (
>        )
>
> )
>
> I guess content is gone cause it's like this:
>
> <content:encoded>
>
> And description is gone cause it's like this:
>
> <![CDATA[
>
> But how can i avoid this problem (i'm quite new)?
>
> bye

Hi, Doeke, welcome to PHP!

RSS feed processing can be it's own special form of hell, as feed
providers often include a whole set of extra namespaces. Luckily, this
doesn't necessarily cause that much of a concern because you can
include them as well.

First, notice the beginning of the sample RSS feed:

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
        xmlns:content="http://purl.org/rss/1.0/modules/content/";
        xmlns:wfw="http://wellformedweb.org/CommentAPI/";
        xmlns:dc="http://purl.org/dc/elements/1.1/";
        xmlns:atom="http://www.w3.org/2005/Atom";
        xmlns:sy="http://purl.org/rss/1.0/modules/syndication/";
        xmlns:slash="http://purl.org/rss/1.0/modules/slash/";
        >

You will need to be able to tell your XML parser about all those extra
name spaces in order for it to return useful info. Look at the manual
for the [SimpleXMLElement::getDocNamespaces](http://us.php.net/manual/en/simplexmlelement.getdocnamespaces.php),
which will tell you what it's using.

You'll notice that one of the name spaces above is "content", and if
you look inthe RSS feed, you'll see a
<content:encoded>....</content:encoded> in each feed item. You need to
retrieve this by specifying the content namespace for the element
"encoded" in the item.

To do that, you'll need to register the namespace with the XPath using
[SimpleXMLElement::registerXPathNamespace](http://us.php.net/manual/en/simplexmlelement.registerxpathnamespace.php).
Once you do that, you'll be able to retrieve the content:encoded
element via the xpath method.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php