Re: URL restriction on XML file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



That's because the character data is split on the borders of the entities, so for

http://feeds.example.com/?rid=318045f7e13e0b66&cat=48cba686fe041718&f=1

characterData() will be called 5 times:

http://feeds.example.com/?rid=318045f7e13e0b66
&
cat=48cba686fe041718
&
f=1

Solution is inlined below

Roger Thomas wrote:
I have a short script to parse my XML file. The parsing produces no error and all output looks good EXCEPT url-links were truncated IF it contain the '&' characters.

My XML file looks like this:
--- start of XML ---
<?xml version="1.0" encoding="iso-8859-1"?>
<rss version="2.0">
<channel>
<title>Test News .Net - Newspapers on the Net</title>
<copyright>Small News Network.com</copyright>
<link>http://www.example.com/</link>
<description>Continuously updating Example News.</description>
<language>en-us</language>
<pubDate>Tue, 29 Mar 2005 18:01:01 -0600</pubDate>
<lastBuildDate>Tue, 29 Mar 2005 18:01:01 -0600</lastBuildDate>
<ttl>30</ttl>
<item>
<title>Group buys SunGard for US$10.4bil</title>
<link>http://feeds.example.com/?rid=318045f7e13e0b66&amp;cat=48cba686fe041718&amp;f=1</link>
<description>NEW YORK: A group of seven private equity investment firms agreed yesterday to buy financial technology company SunGard Data Systems Inc in a deal worth US$10.4bil plus debt, making it the biggest lev...</description>
<source url="http://biz.theexample.com/";>The Paper</source>
</item>
<item>
<title>Strong quake hits Indonesia coast</title>
<link>http://feeds.example.com/news/world/quake.html</link>
<description>a &quot;widely destructive tsunami&quot; and the quake was felt as far away as Malaysia.</description>
<source url="http://biz.theexample.com.net/";>The Paper</source>
</item>
<item>
<title>Final News</title>
<link>http://feeds.example.com/?id=abcdef&amp;cat=somecat</link>
<description>We are going to expect something new this weekend ...</description>
<source url="http://biz.theexample.com/";>The Paper</source>
</item>
</channel>
</rss>
--- end of XML ---

For the sake of testing, my script only print out the url-link to those news above. I got these:
f=1
http://feeds.example.com/news/world/quake.html
cat=somecat

The output for line 1 is truncated to 'f=1' and the output of line 3 is truncated to 'cat=somecat'. ie, the script only took the last parameter of the url-link. The output for line 2 is correct since it has NO parameters.

I am not sure what I have done wrong in my script. Is it bcos the RSS spec says that you cannot have parameters in URL ? Please advise.

-- start of script --
<?
$file = "test.xml";
$currentTag = "";

function startElement($parser, $name, $attrs) {
    global $currentTag;
    $currentTag = $name;
}

function endElement($parser, $name) {
    global $currentTag, $TITLE, $URL, $start;

    switch ($currentTag) {
        case "ITEM":
            $start = 0;
        case "LINK":
             if ($start == 1)
                 #print "<A HREF = \"".$URL."\">$TITLE</A><BR>";
                 print "$URL"."<BR>";
             break;
    }
   $currentTag = "";

// Reset also other variables: $URL = ''; $TITLE = '';

}

function characterData($parser, $data) {
    global $currentTag, $TITLE, $URL, $start;

    switch ($currentTag) {
        case "ITEM":
            $start = 1;
        case "TITLE":
           $TITLE = $data;

// append instead: $TITLE .= $data;

           break;
        case "LINK":
            $URL = $data;

// append instead: $URL .= $data;

// Warning: entities are decoded at this point, you will receive &, not &amp;

            break;
    }
}

$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");

if (!($fp = fopen($file, "r"))) {
    die("Cannot locate XML data file: $file");
}

while ($data = fread($fp, 4096)) {
    if (!xml_parse($xml_parser, $data, feof($fp))) {
        die(sprintf("XML error: %s at line %d",
            xml_error_string(xml_get_error_code($xml_parser)),
            xml_get_current_line_number($xml_parser)));
    }
}

xml_parser_free($xml_parser);

?>
-- end of script --

TIA.
Roger


--------------------------------------------------- Sign Up for free Email at http://ureg.home.net.my/ ---------------------------------------------------


-- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux