Re: upgrading a three year old server

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



tony wrote:

Can I change the encoding to UTF-8 easily?

I had to convert as well while upgrading to 8.1.0.

We came from 8.0.3; for some reason all our db's were 'LATIN1' in PG, and while things worked OK, we wanted to get rid of this too. All the data in our db's came from web-apps and forms. All the pages that were used to enter that data were UTF-8, so essentially the data *was* correct UTF8. First I went for the dirty hack and tried to fool PG by simply changing the LATIN1 string in the dump to UTF8; it seems to work but later on pg_restore chokes on the blobs.
After extensive research I found a solution that worked 100% for me.

On the old server, dump db including blobs etc:
pg_dump -Fc yourdb > yourdb.dump

On our new server (debian), the installation seemed to have run initdb with -E LATIN1, so I needed to remove my data tree first and the properly recreate it:

initd -E UTF-8 -D /var/lib/postgresql/8.1/main/data

After that I created the db but with the old LATIN1 encoding:

createdb -E LATIN1 yourdb

Then restore the old dump into it:

pg_restore yourdb yourdb.dump

From this moment on, the data on my webpages looks OK (I looked at some really foreign stuff my users had filled in, like Arabic and Hebrew (no pun intended)), but the db is still LATIN1.

Because 8.1 allows you to dump db's with blobs and all to TXT files, you dump the whole thing to TXT on your new server:

pg_dump -Fp yourdb > yourdb.txt.dump

Then edit the dumpfile manually and change the LATIN1 string to UTF8.
If you are sure your db does not actually contain the string LATIN1, you could do:

pg_dump -Fp yourdb | sed 's/LATIN1/UTF8/g' > yourdb.txt.dump

Now drop the db and recreate it with the correct encoding:

dropdb yourdb && createdb -E UTF8 yourdb

Now import the dump; this needs to be done with psql because it is txt:

psql yourdb < yourdb.txt.dump

At this time you will have a proper UTF8 db.

FYI, this solution worked for me, and I am pretty sure the reason why it worked it that our websites were 'UTF8-proof' from the beginning.


If everything is setup correct, you might see really cool strings appear on user-filled-in forms:

http://www.terena.nl/compendium/2005/basicinfo.php?nrenid=26

:)

Best regards,



--
  *    ***     Dick Visser         TIENHUIS Networking
 **   *   *    Touwbaan 68             P: +31206843731
  *   * ***    1018 HS Amsterdam       F: +31208641420
  *   * * *    The Netherlands         M: +31622698108
  *   **  *    IP-phone (SIP)/email:  dick@xxxxxxxxxxx
  *   *   *    PGP-key: http://www.tienhuis.nl/gpg.txt
  *   *   *    Webcam: http://www.tienhuis.nl/cam2.asx
 ***   ***


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux