Search Postgresql Archives

Re: Trouble with UTF-8 data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Tom Lane wrote:
>> But I'm still getting this error when loading the data into the new  
>> database:
> 
>> ERROR:  invalid byte sequence for encoding "UTF8": 0xeda7a1
> 
> The reason PG doesn't like this sequence is that it corresponds to
> a Unicode "surrogate pair" code point, which is not supposed to
> ever appear in UTF-8 representation --- surrogate pairs are a kluge for
> UTF-16 to deal with Unicode code points of more than 16 bits.

0xEDA7A1 (UTF-8) corresponds to UNICODE code point 0xD9E1, which,
when interpreted as a high surrogare and followed by a low surrogate,
would correspond to the UTF-16 encoding of a code point
between 0x88400 and 0x887FF (depending on the value of the low surrogate).

These code points do not correspond to any valid character.
So - unless there is a flaw in my reasoning - there's something
fishy with these data anyway.

Janine, could you give us a hex dump of that line from the copy statement?

Yours,
Laurenz Albe

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your
       message can get through to the mailing list cleanly


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux