On 30 September 2011 18:22, Ron Piggott <ron.php@xxxxxxxxxxxxxxxxxx> wrote: > > -----Original Message----- From: Richard Quadling > Sent: Friday, September 30, 2011 12:31 PM > To: Ron Piggott > Cc: php-general@xxxxxxxxxxxxx > Subject: Re: RSS Feed Accented Characters > > On 30 September 2011 17:26, Ron Piggott <ron.php@xxxxxxxxxxxxxxxxxx> wrote: >> >> I am trying to set up an RSS Feed in the Spanish language using a PHP cron >> job. I am unsure of how to deal with accented letters. >> >> An example: >> >> This syntax: >> >> <?php >> >> $rss_content .= "<description>" . htmlentities("El Versículo del Día") . >> "</description>\r\n"; >> >> ?> >> >> Outputs: >> >> >> <description>El Versículo del Día</description> >> >> >> When I use an RSS Feed validator I receive the error message >> >> This feed does not validate. >> >> a.. line 24, column 20: XML parsing error: <unknown>:24:20: undefined >> entity >> >> I suspect the “;” is the issue, although it is needed for the accented >> letters. If I don’t use htmlentities() the accented characters can’t be >> viewed, they become a “?” How should I proceed? >> >> Ron > > Make sure you have ... > > <?xml version="1.0" encode="UTF-8"?> > > as the first line of the output. That tells the reader that the file > is a UTF-8 encoded file. Also, if you ejecting HTTP headers, make sure > that they say the encoding is UTF-8 and not a codepage. > > Go UTF-8 everywhere. > > > -- > Richard Quadling > Twitter : EE : Zend : PHPDoc > @RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY : bit.ly/lFnVea > > > > > Hi Richard: > > Having " <?xml version="1.0" encoding="UTF-8"?> " as the starting > line didn't correct the problem. > > The RSS Feed is @ > http://www.elversiculodeldia.info/peticiones-de-rezo-rss.xml > > There are a variety of errors related to accented characters while using a > feed valuator > http://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fwww.elversiculodeldia.info%2Fpeticiones-de-rezo-rss.xml > > - Also While viewing the feed in Firefox once the first accented character > is displayed none of the rest of the feed is visible, except by right > clicking and "view source" > > The RSS Feed content will be populated by a database query. The database > columns are set to utf8_unicode_ci > > How should I proceed? > Ron > The byte sequence that is being received is just 0xED. php -r "file_put_contents('a.rss', file_get_contents('http://www.elversiculodeldia.info/peticiones-de-rezo-rss.xml'));" This is NOT UTF-8 encoded data, but is ISO-8859-1 Latin-1 (most likely). So as I see it you have 1 choice. Either use <?xml version="1.0" encoding="ISO-8859-1"?> as the XML tag or convert the encoded data to UTF-8. It also means that the data in the sql server is NOT UTF-8 and will need to be converted also. I would recommend doing that first. That will mean reading the data as ISO-8859-1 and converting it to UTF-8 and then saving it again. I'd also be looking at the app that inputs the data into the DB initially. To convert the text, here are 2 examples. I'm sure there are more ways. <?php $iso_text = 'El Versículo del Día: Pray For Others: Incoming Prayer Requests'; $utf_8_text = utf8_encode($iso_text); var_dump($iso_text, $utf_8_text); $utf_8_text = iconv('ISO-8859-1', 'UTF-8', $iso_text); var_dump($iso_text, $utf_8_text); ?> outputs ... string(63) "El Vers퀀culo del D퀀a: Pray For Others: Incoming Prayer Requests" string(65) "El Versículo del Día: Pray For Others: Incoming Prayer Requests" string(63) "El Vers퀀culo del D퀀a: Pray For Others: Incoming Prayer Requests" string(65) "El Versículo del Día: Pray For Others: Incoming Prayer Requests" notice that the correct strings are 2 bytes longer? The í is encoded as 0xC3AD or U+00ED. -- Richard Quadling Twitter : EE : Zend : PHPDoc @RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY : bit.ly/lFnVea -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php