Search Postgresql Archives

Re: UTF8 encoding problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 18, 2008 at 08:25:07AM +0200, Giorgio Valoti wrote:
> On 18/giu/08, at 03:04, Michael Fuhr wrote:
> > Is the data UTF-8?  If the error is 'invalid byte sequence for  
> > encoding "UTF8": 0xa3' then you probably need to set client_encoding
> > to latin1, latin9, or win1252.
> 
> Why?

UTF-8 has rules about what byte values can occur in sequence;
violations of those rules mean that the data isn't valid UTF-8.
This particular error says that the database received a byte with
the value 0xa3 (163) in a sequence of bytes that wasn't valid UTF-8.

The UTF-8 byte sequence for the pound sign (£) is 0xc2 0xa3.  If
Garry got this error (I don't know if he did; I was asking) then
the byte 0xa3 must have appeared in some other sequence that wasn't
valid UTF-8.  The usual reason for that is that the data is in some
encoding other than UTF-8.

Common encodings for Western European languages are Latin-1
(ISO-8859-1), Latin-9 (ISO-8859-15), and Windows-1252.  All three
of these encodings use a lone 0xa3 to represent the pound sign.  If
the data has a pound sign as 0xa3 and the database complains that
it isn't part of a valid UTF-8 sequence then the data is likely to
be in one of these other encodings.

-- 
Michael Fuhr


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux