Richard Huxton a écrit :
Arnaud Lesauvage wrote:
Hi list !
I already posted this as "COPY FROM encoding error", but I have been
doing some more tests since then.
I'm trying to export data from MS SQL Server to PostgreSQL.
The tables are quite big (>20M rows), so a CSV export and a "COPY FROM3
import seems to be the only reasonable solution.
Or go via MS-Access/Perl and ODBC/DBI perhaps?
Yes, I think it would work. The problem is that the DB is
too big for this king of export. Using DTS from MSSQL to
export directly to PostgreSQL using psqlODBC Unicode Driver,
I exported ~1000 rows per second in a 2-columns table with
~20M rows. That means several days just for this table, and
I have bigger ones !
In DTS, I have 3 options to export a table as a text file : ANSI, OEM
and UNICODE.
I tried all these options (and I have three files, one for each).
Well, what character-set is your database in?
Collation in MSSQL is SQL_Latin1_General_CP1_Cl_AS.
DTS documentation tells me that exporting in ANSI should
export using the current codepage.
According to my local setting, my codepage is Windows-1252.
This file is not correctly read by COPY when using
client_encoding of WIN1252 though...
I then try to import into PostgreSQL. The farther I can get is when
using the UNICODE export, and importing it using a client_encoding set
to UTF8 (I tried WIN1252, LATIN9, LATIN1, ...).
The copy then stops with an error :
ERROR: invalid byte sequence for encoding "UTF8": 0xff
État SQL :22021
The problematic character is the euro currency symbol.
You'll want UTF-8 or LATIN9 for the euro symbol. LATIN1 supports that
character-number but it is used for a different symbol.
Your first step needs to be to find out what character-set your data is in.
Your second is then to decide what char-set you want to use to store it
in PG.
Then you can decide how to get there.
In PG, UTF8 was my choice (the DB already exists, I am just
adding some tables that are still stored in MSSQL), and
according to what you say this was the right choice.
The problem is really about reading this file I think.
I thought that given the character sets available in
PostgreSQL, I would be able to COPY directly from my
exported files.
If I have to convert them using some third party tool, I'll
do that, but that's a bit more painful...
--
Arnaud