On Sunday 04 April 2004 10:50 pm, Tom Lane wrote: > Troels Arvin <troels@arvin.dk> writes: > > In the init-script contained in the RPMs downloadable from the PostgreSQL > > site (I checked the one for Fedora), an explicit locale is set before > > running initdb. - And the explicit locale is not "C". > Only if you don't have a sysconfig file: > # Just in case no locale was set, use en_US > [ ! -f /etc/sysconfig/i18n ] && echo "LANG=en_US" > $PGDATA/../initdb.i18n > I agree though that it seems like a bad choice to default to en_US > rather than C. Lamar, any reason why it's like that? Yes. A bit of history before I enclose an e-mail from Trond Eivind GlomsrÃd (former Red Hat internal PostgreSQL RPMmaintainer) on the subject. I am only enclosing a single e-mail of an exchange that occurred over a period of a couple of weeks; I have pretty much whole exchange archived if you want to read more, although I cannot reveal the whole exchange due to some NDA stuff in it. Although it might be OK at this point, since that was, after all, 3 years ago. Back in PostgreSQL 7.1 days, locale settings and the issue of a database being initdb'ed in one locale and the postmaster starting in another locale reared up its head. I 'solved' the issue by hardcoding LC_ALL=C in the initscript. This had the side-effect of making the regression tests pass. Trond wasn't happy with my choice of C locale, and here is why: Re: Thought you might find this very interesting. From: teg@redhat.com (Trond Eivind GlomsrÃd) To: Lamar Owen <lamar.owen@wgcr.org> Lamar Owen <lamar.owen@wgcr.org> writes: > On Friday 25 May 2001 15:04, you wrote: > > Lamar Owen <lamar.owen@wgcr.org> writes: > > > > I also intend to kill the output from database initialization. > > > > I thought you had, at least in the RedHat 7.1 7.0.3 set. > > > Yup, but it has started showing up again in PostgreSQL 7.1.x > > I need to sync that in with this set. I've fixed a couple of issues with the inistscript, I'll send it to you when it's finished.... even after sourcing a file with locale values, the postmaster process doesn't seem to respect it. I'll need to make this work before I build (I've confirmed that the current way of handling this, using "C", is not acceptable. The locale needs to be different, and if that causes problems for pgsql, it's a bug in pgsql which needs fixing - handling other aspects, like ordering, in a bad way isn't an acceptable workaround. > > "C" equals broken for non-English locales, and isn't an acceptable choice. > > That is one argument I'll not be involved in, as I'm so used to the ASCII > sequence that it is second-nature, thus disqualifying me from commenting on > any collation issues. 1) It's not a vaslid choice for English - if you're looking in a  Âlexicon, you'll find Aspen, bridge, Cambridge, not Aspen,  ÂCambridge, bridge. 2) It's much worse in other locales... it gets the order of  Âchaaracters wrong as well. Here is a test: create table bar( ÂÂÂÂÂÂÂÂord varchar(40), ÂÂÂÂÂÂÂÂfoo int, ÂÂÂÂÂÂÂÂprimary key(ord)); insert into bar values('Ãre',2); insert into bar values('Ãre',3); insert into bar values('are',4); insert into bar values('zsh',5); insert into bar values('begynne',6); insert into bar values('Ãve',7); select ord,foo from bar order by ord; Here is a valid result: Âare   |  4 Âbegynne |  6 Âzsh   |  5 ÂÃre   |  2 ÂÃve   |  7 ÂÃre   |  3 Here is an invalid result: Âare   |  4 Âbegynne |  6 Âzsh   |  5 ÂÃre   |  3 ÂÃre   |  2 ÂÃve   |  7  The last one is what you get with LANG=C - as you can see, the ordering of the Norwegian characters is wrong. The same would be the issue for pretty much any non-English characters - their number in the character table (as used by C) is not the same as their location in the local alphabet (as used by the local locale). -- Trond Eivind GlomsrÃd Red Hat, Inc. So there is a reason it is like it is. If you want to change that in the local setting, you will have to reinitdb in C locale (and edit /var/lib/pgsql/initdb.i18n accordingly, and be prepared for collation differences and problems). The initial initdb is done in the system locale, unless one does not exist, in which case en_US is used (again, so that when you do store non-English characters you get sane ordering, and so that you get the mixed-case ordering preferred by many people). The initdb locale settings are stored in initdb.i18n, and they are re-sourced everytime postgresql is started to prevent data corruption if postmaster is started with a different locale from the initdb. Tom, is the data corruption issue still an issue with 7.4.x, or is this just historical? It has been a long time since I've looked in this corner of the RPM.... :-) -- Lamar Owen Director of Information Technology Pisgah Astronomical Research Institute 1 PARI Drive Rosman, NC 28772 (828)862-5554 www.pari.edu ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match