Search Postgresql Archives

Re: retry: converting ASCII to UTF-8

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Tom Hart wrote:
> Originally I set up the database to use ASCII encoding. However we
> have a large base of Spanish speaking members and services,
> and we need utf-8 encoding to maintain and support the extended character
> sets. In my naivety I thought it would be a relatively simple process 
> to convert the db but I've found this to not be the case. I tried doing 
> a dump and restore into a new database with the proper encoding, but 
> pg_restore is getting hung up on one of the tables, our largest by far 
> (~1gb, not huge I know). When I tried pg_restore from a command line
> (I was using pgAdmin, I know i'm a nub) I received this error.
> 
> C:\Program Files\PostgreSQL\8.2\bin>pg_restore.exe -i -h 192.168.1.xxx -p 5432 -U foobar -d warehouse_utf8 -a -t "transaction" -v "O:\foo\bar\pg_dump_transaction.backup"
> pg_restore: connecting to database for restore
> Password:
> pg_restore: restoring data for table "transaction"
> pg_restore: [archiver (db)] Error while PROCESSING TOC:
> pg_restore: [archiver (db)] Error from TOC entry 1706; 0 17861 TABLE DATA transaction foobar
> pg_restore: [archiver (db)] COPY failed: ERROR:  invalid byte sequence for encoding "UTF8": 0xc52f
> HINT:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
> CONTEXT:  COPY transaction, line 209487
> WARNING: errors ignored on restore: 1

You will have to figure out in which encoding the data really are.

SQL_ASCII will allow anything in, and the client is responsible for
feeding the right thing.

Somebody stored a 0xC5 in the database, which is not ASCII.
In WIN-1252, ISO8859-1, and ISO8859-15 that would be Å.
Could that be?

Once you find out the correct encoding, you can uye the -E switch of
pg_dump to set that encoding for your exported data.

If your clients entered consistent data, that should work.

If different clients used different encodings, you might end up
sorting it out manually...

Yours,
Laurenz Albe

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux