www.TheVerseOfTheDay.info
-----Original Message-----
From: Richard Quadling
Sent: Friday, September 30, 2011 2:53 PM
To: Ron Piggott
Cc: php-general@xxxxxxxxxxxxx
Subject: Re: RSS Feed Accented Characters
On 30 September 2011 18:22, Ron Piggott <ron.php@xxxxxxxxxxxxxxxxxx> wrote:
-----Original Message----- From: Richard Quadling
Sent: Friday, September 30, 2011 12:31 PM
To: Ron Piggott
Cc: php-general@xxxxxxxxxxxxx
Subject: Re: RSS Feed Accented Characters
On 30 September 2011 17:26, Ron Piggott <ron.php@xxxxxxxxxxxxxxxxxx>
wrote:
I am trying to set up an RSS Feed in the Spanish language using a PHP
cron
job. I am unsure of how to deal with accented letters.
An example:
This syntax:
<?php
$rss_content .= "<description>" . htmlentities("El Versículo del Día") .
"</description>\r\n";
?>
Outputs:
<description>El Versículo del Día</description>
When I use an RSS Feed validator I receive the error message
This feed does not validate.
a.. line 24, column 20: XML parsing error: <unknown>:24:20: undefined
entity
I suspect the “;” is the issue, although it is needed for the accented
letters. If I don’t use htmlentities() the accented characters can’t be
viewed, they become a “?” How should I proceed?
Ron
Make sure you have ...
<?xml version="1.0" encode="UTF-8"?>
as the first line of the output. That tells the reader that the file
is a UTF-8 encoded file. Also, if you ejecting HTTP headers, make sure
that they say the encoding is UTF-8 and not a codepage.
Go UTF-8 everywhere.
--
Richard Quadling
Twitter : EE : Zend : PHPDoc
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY : bit.ly/lFnVea
Hi Richard:
Having " <?xml version="1.0" encoding="UTF-8"?> " as the starting
line didn't correct the problem.
The RSS Feed is @
http://www.elversiculodeldia.info/peticiones-de-rezo-rss.xml
There are a variety of errors related to accented characters while using a
feed valuator
http://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fwww.elversiculodeldia.info%2Fpeticiones-de-rezo-rss.xml
- Also While viewing the feed in Firefox once the first accented character
is displayed none of the rest of the feed is visible, except by right
clicking and "view source"
The RSS Feed content will be populated by a database query. The database
columns are set to utf8_unicode_ci
How should I proceed?
Ron
The byte sequence that is being received is just 0xED.
php -r "file_put_contents('a.rss',
file_get_contents('http://www.elversiculodeldia.info/peticiones-de-rezo-rss.xml'));"
This is NOT UTF-8 encoded data, but is ISO-8859-1 Latin-1 (most likely).
So as I see it you have 1 choice.
Either use <?xml version="1.0" encoding="ISO-8859-1"?> as the XML tag
or convert the encoded data to UTF-8.
It also means that the data in the sql server is NOT UTF-8 and will
need to be converted also.
I would recommend doing that first.
That will mean reading the data as ISO-8859-1 and converting it to
UTF-8 and then saving it again.
I'd also be looking at the app that inputs the data into the DB initially.
To convert the text, here are 2 examples. I'm sure there are more ways.
<?php
$iso_text = 'El Versículo del Día: Pray For Others: Incoming Prayer
Requests';
$utf_8_text = utf8_encode($iso_text);
var_dump($iso_text, $utf_8_text);
$utf_8_text = iconv('ISO-8859-1', 'UTF-8', $iso_text);
var_dump($iso_text, $utf_8_text);
?>
outputs ...
string(63) "El Vers퀀culo del D퀀a: Pray For Others: Incoming Prayer Requests"
string(65) "El Versículo del Día: Pray For Others: Incoming Prayer Requests"
string(63) "El Vers퀀culo del D퀀a: Pray For Others: Incoming Prayer Requests"
string(65) "El Versículo del Día: Pray For Others: Incoming Prayer Requests"
notice that the correct strings are 2 bytes longer?
The í is encoded as 0xC3AD or U+00ED.
--
Richard Quadling
Twitter : EE : Zend : PHPDoc
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY : bit.ly/lFnVea
Richard I was unaware of the
utf8_encode
command. Thank you very much --- this now works. Now I may continue with
the translation into Spanish.
Ron
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php