8.3.0, locales, and encodings

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I got a small surprise when upgrading to 8.3.0 today (CentOS 4.4). When I went to restore my database, PSQL complained that my "CREATE DATABASE... ENCODING 'UTF8';" wasn't valid (the new phpBB 3.0 software enforces using UTF8). Sure enough, a bit of research showed that the supplied /etc/init.d/postgresql file had no encoding specified, which meant that it defaulted to my operating system's 'en_US.ISO-8859-1' encoding. A quick reading of the PostgreSQL documentation revealed that PostgreSQL can't really use an encoding different from that specified during INITDB (caveat below), and apparently that is now enforced in v8.3.

OK, so I manually run INITDB w/ "-E UTF8", and now I can create/load one of my databases. Unfortunately, when I go to create another database (where the data is 'ISO-8859-1'), now IT won't create. A more detailed reading of the PostgreSQL documentation revealed that PostgreSQL can't use an encoding different from that specified during INITDB, UNLESS INITDB is run w/ "-locale POSIX".

Ok, I rerun INITDB w/ "-locale POSIX -E UTF8", and now both databases create/load, apparently properly. I also did an INITDB on another database server, running INITDB w/ just "-locale POSIX" (apparently meaning that the "template1" encoding defaulted to "SQL_ASCII"), and successfully create/reload another database there (explicitly specifying an ENCODING there).

Questions/suggestions:

1. The provided /etc/init.d/postgresql file seems to be part of this trap. Should it be changed to default the encoding to POSIX? 2. I want my installation to have the maximum flexibility. Should I be running INITDB w/ "-locale POSIX -E UTF8", or just "-locale POSIX" (implying "-E SQL_ASCII")? Or does it make a difference, since I always specify the desired database encoding when I "CREATE DATABASE... ENCODING '...';"? 3. One of my databases is ISO-8859-1, because that's the format of the data I get from the US gov't. However, should I instead use the 'UTF8' encoding for that database, and then use a client (PSQL) encoding of "LATIN1", and let PostgreSQL convert the data? Pros & cons?

Performance is a concern, but disk space is not.

I presume this ("admin") is the proper PostgreSQL list to post this to. If not, let me know.

-- Dean Gibson

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux