Search Postgresql Archives

Re: MSSQL to PostgreSQL : Encoding problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Richard Huxton a écrit :
Arnaud Lesauvage wrote:
Hi list !

I already posted this as "COPY FROM encoding error", but I have been doing some more tests since then.

I'm trying to export data from MS SQL Server to PostgreSQL.
The tables are quite big (>20M rows), so a CSV export and a "COPY FROM3 import seems to be the only reasonable solution.

Or go via MS-Access/Perl and ODBC/DBI perhaps?

Yes, I think it would work. The problem is that the DB is too big for this king of export. Using DTS from MSSQL to export directly to PostgreSQL using psqlODBC Unicode Driver, I exported ~1000 rows per second in a 2-columns table with ~20M rows. That means several days just for this table, and I have bigger ones !


In DTS, I have 3 options to export a table as a text file : ANSI, OEM and UNICODE.
I tried all these options (and I have three files, one for each).

Well, what character-set is your database in?


Collation in MSSQL is SQL_Latin1_General_CP1_Cl_AS.
DTS documentation tells me that exporting in ANSI should export using the current codepage.
According to my local setting, my codepage is Windows-1252.
This file is not correctly read by COPY when using client_encoding of WIN1252 though...


I then try to import into PostgreSQL. The farther I can get is when using the UNICODE export, and importing it using a client_encoding set to UTF8 (I tried WIN1252, LATIN9, LATIN1, ...).
The copy then stops with an error :
ERROR: invalid byte sequence for encoding "UTF8": 0xff
État SQL :22021

The problematic character is the euro currency symbol.

You'll want UTF-8 or LATIN9 for the euro symbol. LATIN1 supports that character-number but it is used for a different symbol.

Your first step needs to be to find out what character-set your data is in.
Your second is then to decide what char-set you want to use to store it in PG.
Then you can decide how to get there.

In PG, UTF8 was my choice (the DB already exists, I am just adding some tables that are still stored in MSSQL), and according to what you say this was the right choice.
The problem is really about reading this file I think.

I thought that given the character sets available in PostgreSQL, I would be able to COPY directly from my exported files. If I have to convert them using some third party tool, I'll do that, but that's a bit more painful...

--
Arnaud


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux