Search Postgresql Archives

Re: Encoding Conversion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



jef peeraer wrote:
beer schreef:
Hello All

So I have an old database that is ASCII_SQL encoded. For a variety of reasons I need to convert the database to UNICODE. I did some googling on this but have yet to find anything that looked like a viable option, so i thought I'd post to the group and see what sort of advice might arise. :)
well i recently struggled with the same problem. After a lot of trial and error and reading, it seems that an ascii encoded database can't use its client encoding capabilities ( set client_encoding to utf8 ). i think the easist solution is to do a dump, recreate the database with a proper encoding, and restore the dump.

jef peeraer

TIA

-b


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your
       message can get through to the mailing list cleanly



---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your
      message can get through to the mailing list cleanly



In my experience ASCII_SQL will let you put anything in there. You need to figure out the actual encoding of the data. Is it LATIN1? Is it UTF-8? UTF-16? I found that my old ASCII_SQL dbs, before they were converted to unicode, contained 99.9% LATIN1 chars but also had a few random weird characters thrown in from people copying and pasting from office. For instance MS Word uses these non-ascii standard characters to implement it's "magic quotes" or whatever they call it where the quotes curl in towards each other.

I had to identify what the bad chars were. I think that viewing the dump in vi showed me the hex codes for the non-ascii chars. Then I changed the encoding specified at the top of the dump as LATIN1. Then I used sed to remove them as I piped it into a postgres unicode db.

Rick



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux