At 01:18 23/07/2007, you wrote:
Message-ID: <D0.83.42241.94332A64@xxxxxxxxxxxx>
From: aldnin <aldnin@xxxxxxxx>
Subject: Re: PHP + PostgreSQL: invalid byte sequence for encoding
> This indicates that PHP not using UTF-8. That output is typical of
> UTF-8 output as Latin characters.
> I had similar problems getting PHP to work with UTF-8 and MySQL. Many
> of PHP's function are not multibyte aware and assume a Latin character set.
> What, if any, output buffering are you using? What is your
> default_charset set to?
Well, I've set the default_charset to UTF8, it
was set before to "" (empty) - but the output on
console (cli) and the problem is still the same
also after changing this to UTF8, so: this is
not the problem, and I don't need proper output
on console without utf8_decode() - if I want
proper output there I just do a decode, like I
do when I want it to get outputed in the browser properly.
Maybe a cleaner explanation of the problem:
I fetch something from database, which looks
like "lacarrière" when I output it in PHP -
well don't let us get confused from PHPs output.
Then I fetch something from another ressource
looking like "lacarrière" - when I compare both
strings in PHP it tells me that they are "not equal".
The default_charset seems to work only on output
buffer, so the solution for that problem could
only be a mechanism to tell PHP handling all
strings UTF8 byte encoded, which should mean a
lot of more ressources to be taken for this
process - I understand that this is not a solution.
So the only solutions could be:
a) Decode and encode properly utf8 stuff and to
take care if the content is utf8-byte encoded so
it needs to be decoded before using it properly with other strings
b) A mechanism to tell the pg-functions in PHP
to decode all data which is UTF8-Encoded. The
ADODB-Layers seems to do that properly, but the
pg-functions don't do that as I can see.
Try to send "select 'lacarrière' as test;' with
pg_query to any postgres database, you'll get an
error, if not... well, then I'm wrong and I've
set up PHP wrong to handle UTF8-stuff.
There are several areas when encoding issues can
arise between PHP (client) and DB server. One
which you've not considered is the client
connection, that is the encoding used when transferring resultsets to PHP.
I met this a few weeks ago in MySQL while
stashing XML recordsets with non ISO-8859-1 content.
The solution is pretty simple once you hit it,
and works in both MySQL and PGSQL because it's standard SQL-92 :
$query="SET NAMES 'UTF-8'";
Issue that at the time you first make your
connection in your DB abstraction library - you
can send the query immediately after establishing
the connection, an all subsequent queries using
that connection will have the charset for transfer correctly stated.
@see :
'21.2.3. Automatic Character Set Conversion Between Server and Client'
http://www.postgresql.org/docs/8.1/static/multibyte.html
HTH
Cheers - Neil
--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php