http://feeds.example.com/?rid=318045f7e13e0b66&cat=48cba686fe041718&f=1
characterData() will be called 5 times:
http://feeds.example.com/?rid=318045f7e13e0b66 & cat=48cba686fe041718 & f=1
Solution is inlined below
Roger Thomas wrote:
I have a short script to parse my XML file. The parsing produces no error and all output looks good EXCEPT url-links were truncated IF it contain the '&' characters.
My XML file looks like this: --- start of XML --- <?xml version="1.0" encoding="iso-8859-1"?> <rss version="2.0"> <channel> <title>Test News .Net - Newspapers on the Net</title> <copyright>Small News Network.com</copyright> <link>http://www.example.com/</link> <description>Continuously updating Example News.</description> <language>en-us</language> <pubDate>Tue, 29 Mar 2005 18:01:01 -0600</pubDate> <lastBuildDate>Tue, 29 Mar 2005 18:01:01 -0600</lastBuildDate> <ttl>30</ttl> <item> <title>Group buys SunGard for US$10.4bil</title> <link>http://feeds.example.com/?rid=318045f7e13e0b66&cat=48cba686fe041718&f=1</link> <description>NEW YORK: A group of seven private equity investment firms agreed yesterday to buy financial technology company SunGard Data Systems Inc in a deal worth US$10.4bil plus debt, making it the biggest lev...</description> <source url="http://biz.theexample.com/">The Paper</source> </item> <item> <title>Strong quake hits Indonesia coast</title> <link>http://feeds.example.com/news/world/quake.html</link> <description>a "widely destructive tsunami" and the quake was felt as far away as Malaysia.</description> <source url="http://biz.theexample.com.net/">The Paper</source> </item> <item> <title>Final News</title> <link>http://feeds.example.com/?id=abcdef&cat=somecat</link> <description>We are going to expect something new this weekend ...</description> <source url="http://biz.theexample.com/">The Paper</source> </item> </channel> </rss> --- end of XML ---
For the sake of testing, my script only print out the url-link to those news above. I got these: f=1 http://feeds.example.com/news/world/quake.html cat=somecat
The output for line 1 is truncated to 'f=1' and the output of line 3 is truncated to 'cat=somecat'. ie, the script only took the last parameter of the url-link. The output for line 2 is correct since it has NO parameters.
I am not sure what I have done wrong in my script. Is it bcos the RSS spec says that you cannot have parameters in URL ? Please advise.
-- start of script -- <? $file = "test.xml"; $currentTag = "";
function startElement($parser, $name, $attrs) { global $currentTag; $currentTag = $name; }
function endElement($parser, $name) { global $currentTag, $TITLE, $URL, $start;
switch ($currentTag) { case "ITEM": $start = 0; case "LINK": if ($start == 1) #print "<A HREF = \"".$URL."\">$TITLE</A><BR>"; print "$URL"."<BR>"; break; } $currentTag = "";
// Reset also other variables: $URL = ''; $TITLE = '';
}
function characterData($parser, $data) { global $currentTag, $TITLE, $URL, $start;
switch ($currentTag) { case "ITEM": $start = 1; case "TITLE": $TITLE = $data;
// append instead: $TITLE .= $data;
break; case "LINK": $URL = $data;
// append instead: $URL .= $data;
// Warning: entities are decoded at this point, you will receive &, not &
break; } }
$xml_parser = xml_parser_create(); xml_set_element_handler($xml_parser, "startElement", "endElement"); xml_set_character_data_handler($xml_parser, "characterData");
if (!($fp = fopen($file, "r"))) { die("Cannot locate XML data file: $file"); }
while ($data = fread($fp, 4096)) { if (!xml_parse($xml_parser, $data, feof($fp))) { die(sprintf("XML error: %s at line %d", xml_error_string(xml_get_error_code($xml_parser)), xml_get_current_line_number($xml_parser))); } }
xml_parser_free($xml_parser);
?> -- end of script --
TIA. Roger
--------------------------------------------------- Sign Up for free Email at http://ureg.home.net.my/ ---------------------------------------------------
-- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php