Re: PHP + PostgreSQL: invalid byte sequence for encoding

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



At 01:18 23/07/2007, you wrote:
Message-ID: <D0.83.42241.94332A64@xxxxxxxxxxxx>
From: aldnin <aldnin@xxxxxxxx>
Subject: Re:  PHP + PostgreSQL: invalid byte sequence for encoding

> This indicates that PHP not using UTF-8.  That output is typical of
> UTF-8 output as Latin characters.

> I had similar problems getting PHP to work with UTF-8 and MySQL.  Many
> of PHP's function are not multibyte aware and assume a Latin character set.
> What, if any, output buffering are you using? What is your
> default_charset set to?

Well, I've set the default_charset to UTF8, it was set before to "" (empty) - but the output on console (cli) and the problem is still the same also after changing this to UTF8, so: this is not the problem, and I don't need proper output on console without utf8_decode() - if I want proper output there I just do a decode, like I do when I want it to get outputed in the browser properly.

Maybe a cleaner explanation of the problem:

I fetch something from database, which looks like "lacarrière" when I output it in PHP - well don't let us get confused from PHPs output. Then I fetch something from another ressource looking like "lacarrière" - when I compare both strings in PHP it tells me that they are "not equal".

The default_charset seems to work only on output buffer, so the solution for that problem could only be a mechanism to tell PHP handling all strings UTF8 byte encoded, which should mean a lot of more ressources to be taken for this process - I understand that this is not a solution.

So the only solutions could be:

a) Decode and encode properly utf8 stuff and to take care if the content is utf8-byte encoded so it needs to be decoded before using it properly with other strings

b) A mechanism to tell the pg-functions in PHP to decode all data which is UTF8-Encoded. The ADODB-Layers seems to do that properly, but the pg-functions don't do that as I can see.

Try to send "select 'lacarrière' as test;' with pg_query to any postgres database, you'll get an error, if not... well, then I'm wrong and I've set up PHP wrong to handle UTF8-stuff.



There are several areas when encoding issues can arise between PHP (client) and DB server. One which you've not considered is the client connection, that is the encoding used when transferring resultsets to PHP.

I met this a few weeks ago in MySQL while stashing XML recordsets with non ISO-8859-1 content.

The solution is pretty simple once you hit it, and works in both MySQL and PGSQL because it's standard SQL-92 :

$query="SET NAMES 'UTF-8'";

Issue that at the time you first make your connection in your DB abstraction library - you can send the query immediately after establishing the connection, an all subsequent queries using that connection will have the charset for transfer correctly stated.

@see :
'21.2.3. Automatic Character Set Conversion Between Server and Client'
http://www.postgresql.org/docs/8.1/static/multibyte.html


HTH
Cheers - Neil

--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[Index of Archives]     [PHP Home]     [PHP Users]     [Postgresql Discussion]     [Kernel Newbies]     [Postgresql]     [Yosemite News]

  Powered by Linux