Ken Tanzer wrote: > Hi. I've got a recurring problem with character encoding for a Postgres-based web PHP app, and am > hoping someone can clue me in or at least point me in the right direction. I'll confess upfront my > understanding of encoding issues is extremely limited. Here goes. > > The app uses a Postgres database, UTF-8 encoded. Through their browsers, users can add and edit > records often including text. Most of the time this works fine. Though sometimes this will fail with > Postgres complaining, for example, "Could query with ... , The error text was: ERROR: invalid byte > sequence for encoding "UTF8": 0xe9 0x20 0x67" > > So this generally happens when people copy and paste things out of their word documents and such. > > As I understand it, those are likely encoded in something non-UTF-8, like WIN-1251 or something. And > that one way or another, the encoding needs to be translated before it can be placed into the > database. I'm not clear how this is supposed to happen though. Automatically by the browser? Done > in the app? Some other way? And if in the app, how is one supposed to know what the incoming > encoding is? > > Thanks in advance for any help or pointers. The byte sequence 0xe9 0x20 0x67 means "é g" in ISO-8859-1 and WINDOWS-1252, so I think that your setup is as follows: - The PHP application gets data encoded in ISO-8859-1 or WINDOWS-1252 and tries to store it in a database. - The PHP application has a database connection with client_encoding set to UTF8. Then the database thinks it gets UTF-8 and will choke if it gets something different. The solution: - Make sure that your web application gets data in only one encoding. - Set client_encoding to that encoding. Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general