Re: Encoding Conversion

Rick Gigger <rick@xxxxxxxxxxxxxxxxxxxx> · Wed, 10 May 2006 11:14:41 -0600

jef peeraer wrote:
beer schreef:
Hello All

So I have an old database that is ASCII_SQL encoded.  For a variety 
of reasons I need to convert the database to UNICODE.  I did some 
googling on this but have yet to find anything that looked like a 
viable option, so i thought I'd post to the group and see what sort 
of advice might arise. :)
well i recently struggled with the same problem. After a lot of trial 
and error and reading, it seems that an ascii encoded database can't 
use its client encoding capabilities ( set client_encoding to utf8 ).
i think the easist solution is to do a dump, recreate the database 
with a proper encoding, and restore the dump.

jef peeraer

TIA

-b

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your
       message can get through to the mailing list cleanly

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your
      message can get through to the mailing list cleanly

In my experience ASCII_SQL will let you put anything in there.  You need 
to figure out the actual encoding of the data.  Is it LATIN1?  Is it 
UTF-8?  UTF-16?  I found that my old ASCII_SQL dbs, before they were 
converted to unicode, contained 99.9% LATIN1 chars but also had a few 
random weird characters thrown in from people copying and pasting from 
office.  For instance MS Word uses these non-ascii standard characters 
to implement it's "magic quotes" or whatever they call it where the 
quotes curl in towards each other.

I had to identify what the bad chars were.  I think that viewing the 
dump in vi showed me the hex codes for the non-ascii chars.  Then I 
changed the encoding specified at the top of the dump as LATIN1.  Then I 
used sed to remove them as I piped it into a postgres unicode db.

Rick