Re: URL restriction on XML file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Marek,
Thank you for the solution.

--
Roger

Quoting Marek Kilimajer <lists@xxxxxxxxxxxxx>:

> That's because the character data is split on the borders of the 
> entities, so for
> 
> http://feeds.example.com/?rid=318045f7e13e0b66&amp;cat=48cba686fe041718&amp;f=1
> 
> characterData() will be called 5 times:
> 
> http://feeds.example.com/?rid=318045f7e13e0b66
> &
> cat=48cba686fe041718
> &
> f=1
> 
> Solution is inlined below
> 
> Roger Thomas wrote:
> > I have a short script to parse my XML file. The parsing produces no error
> and all output looks good EXCEPT url-links were truncated IF it contain the
> '&amp;' characters.
> > 
> > My XML file looks like this:
> > --- start of XML ---
> > <?xml version="1.0" encoding="iso-8859-1"?>
> > <rss version="2.0">
> > <channel>
> > <title>Test News .Net - Newspapers on the Net</title>
> > <copyright>Small News Network.com</copyright>
> > <link>http://www.example.com/</link>
> > <description>Continuously updating Example News.</description>
> > <language>en-us</language>
> > <pubDate>Tue, 29 Mar 2005 18:01:01 -0600</pubDate>
> > <lastBuildDate>Tue, 29 Mar 2005 18:01:01 -0600</lastBuildDate>
> > <ttl>30</ttl>
> > <item>
> > <title>Group buys SunGard for US$10.4bil</title>
> >
> <link>http://feeds.example.com/?rid=318045f7e13e0b66&amp;cat=48cba686fe041718&amp;f=1</link>
> > <description>NEW YORK: A group of seven private equity investment firms
> agreed yesterday to buy financial technology company SunGard Data Systems Inc
> in a deal worth US$10.4bil plus debt, making it the biggest
> lev...</description>
> > <source url="http://biz.theexample.com/";>The Paper</source>
> > </item>
> > <item>
> > <title>Strong quake hits Indonesia coast</title>
> > <link>http://feeds.example.com/news/world/quake.html</link>
> > <description>a &quot;widely destructive tsunami&quot; and the quake was
> felt as far away as Malaysia.</description>
> > <source url="http://biz.theexample.com.net/";>The Paper</source>
> > </item>
> > <item>
> > <title>Final News</title>
> > <link>http://feeds.example.com/?id=abcdef&amp;cat=somecat</link>
> > <description>We are going to expect something new this weekend
> ...</description>
> > <source url="http://biz.theexample.com/";>The Paper</source>
> > </item>
> > </channel>
> > </rss>
> > --- end of XML ---
> > 
> > For the sake of testing, my script only print out the url-link to those
> news above. I got these:
> > f=1
> > http://feeds.example.com/news/world/quake.html
> > cat=somecat
> > 
> > The output for line 1 is truncated to 'f=1' and the output of line 3 is
> truncated to 'cat=somecat'. ie, the script only took the last parameter of
> the url-link. The output for line 2 is correct since it has NO parameters.
> > 
> > I am not sure what I have done wrong in my script. Is it bcos the RSS spec
> says that you cannot have parameters in URL ? Please advise.
> > 
> > -- start of script --
> > <?
> > $file = "test.xml";
> > $currentTag = "";
> > 
> > function startElement($parser, $name, $attrs) {
> >     global $currentTag;
> >     $currentTag = $name;
> > }
> > 
> > function endElement($parser, $name) {
> >     global $currentTag, $TITLE, $URL, $start;
> > 
> >     switch ($currentTag) {
> >         case "ITEM":
> >             $start = 0;
> >         case "LINK":
> >              if ($start == 1)
> >                  #print "<A HREF = \"".$URL."\">$TITLE</A><BR>";
> >                  print "$URL"."<BR>";
> >              break;
> >     }
> >    $currentTag = "";
> 
> // Reset also other variables:
>     $URL = '';
>     $TITLE = '';
> 
> > }
> > 
> > function characterData($parser, $data) {
> >     global $currentTag, $TITLE, $URL, $start;
> > 
> >     switch ($currentTag) {
> >         case "ITEM":
> >             $start = 1;
> >         case "TITLE":
> >            $TITLE = $data;
> 
> // append instead:
> $TITLE .= $data;
> 
> >            break;
> >         case "LINK":
> >             $URL = $data;
> 
> // append instead:
> $URL .= $data;
> 
> // Warning: entities are decoded at this point, you will receive &, not 
> &amp;
> 
> >             break;
> >     }
> > }
> > 
> > $xml_parser = xml_parser_create();
> > xml_set_element_handler($xml_parser, "startElement", "endElement");
> > xml_set_character_data_handler($xml_parser, "characterData");
> > 
> > if (!($fp = fopen($file, "r"))) {
> >     die("Cannot locate XML data file: $file");
> > }
> > 
> > while ($data = fread($fp, 4096)) {
> >     if (!xml_parse($xml_parser, $data, feof($fp))) {
> >         die(sprintf("XML error: %s at line %d",
> >             xml_error_string(xml_get_error_code($xml_parser)),
> >             xml_get_current_line_number($xml_parser)));
> >     }
> > }
> > 
> > xml_parser_free($xml_parser);
> > 
> > ?>
> > -- end of script --
> > 
> > TIA.
> > Roger
> > 
> > 
> > ---------------------------------------------------
> > Sign Up for free Email at http://ureg.home.net.my/
> > ---------------------------------------------------
> > 
> 
> 





---------------------------------------------------
Sign Up for free Email at http://ureg.home.net.my/
---------------------------------------------------

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux