Re: MSSQL to PostgreSQL : Encoding problem

"Brandon Aiken" <BAiken@xxxxxxxxxxxxxxx> · Wed, 22 Nov 2006 13:55:55 -0500

It also might be a big/little endian problem, although I always thought that was platform specific, not locale specific.

Try the UCS-2-INTERNAL and UCS-4-INTERNAL codepages in iconv, which should use the two-byte or four-byte versions of UCS encoding using the system's default endian setting.

There's many Unicode codepage formats that iconv supports:
UTF-8
ISO-10646-UCS-2 UCS-2 CSUNICODE
UCS-2BE UNICODE-1-1 UNICODEBIG CSUNICODE11
UCS-2LE UNICODELITTLE
ISO-10646-UCS-4 UCS-4 CSUCS4
UCS-4BE
UCS-4LE
UTF-16
UTF-16BE
UTF-16LE
UTF-32
UTF-32BE
UTF-32LE
UNICODE-1-1-UTF-7 UTF-7 CSUNICODE11UTF7
UCS-2-INTERNAL
UCS-2-SWAPPED
UCS-4-INTERNAL
UCS-4-SWAPPED

Gee, didn't Unicode just so simplify this codepage mess?  Remember when it was just ASCII, EBCDIC, ANSI, and localized codepages?

--
Brandon Aiken
CS/IT Systems Engineer
-----Original Message-----
From: pgsql-general-owner@xxxxxxxxxxxxxx [mailto:pgsql-general-owner@xxxxxxxxxxxxxx] On Behalf Of Arnaud Lesauvage
Sent: Wednesday, November 22, 2006 12:38 PM
To: Arnaud Lesauvage; General
Subject: Re: [GENERAL] MSSQL to PostgreSQL : Encoding problem

Alvaro Herrera a écrit :
> Arnaud Lesauvage wrote:
>> Alvaro Herrera a écrit :
>> >Arnaud Lesauvage wrote:
>> >
>> >>mydb=# SET client_encoding TO LATIN9;
>> >>SET
>> >>mydb=# COPY statistiques.detailrecherche (log_gid, 
>> >>champrecherche, valeurrecherche) FROM 
>> >>'E:\\Production\\Temp\\detailrecherche_ansi.csv' CSV;
>> >>ERROR:  invalid byte sequence for encoding "LATIN9": 0x00
>> >>HINT:  This error can also happen if the byte sequence does 
>> >>not match the encoding expected by the server, which is 
>> >>controlled by "client_encoding".
>> >
>> >Huh, why do you have a "0x00" byte in there?  That's certainly not
>> >Latin9 (nor UTF8 as far as I know).
>> >
>> >Is the file actually Latin-something or did you convert it to something
>> >else at some point?
>> 
>> This is the file generated by DTS with "ANSI" encoding. It 
>> was not altered in any way after that !
>> The doc states that ANSI exports with the local codepage 
>> (which is Win1252). That's all I know. :(
> 
> I thought Win1252 was supposed to be almost the same as Latin1.  While
> I'd expect certain differences, I wouldn't expect it to use 0x00 as
> data!
> 
> Maybe you could have DTS export Unicode, which would presumably be
> UTF-16, then recode that to something else (possibly UTF-8) with GNU
> iconv.

UTF-16 ! That's something I haven't tried !
I'll try an iconv conversion tomorrow from UTF16 to UTF8 !

--
Arnaud

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match