On Sat, Feb 18, 2006 at 01:40:09PM -0500, Tom Lane wrote: > Bill Moseley <moseley@xxxxxxxx> writes: > > - To clarify the first point, if the database is encoded utf-8 and > > lc_collate is en_US then Postgresql does NOT try to convert utf-8 to > > 8859-1 before sorting. > > Basically, this is a horribly bad idea and you should never do it. > The database encoding should always match what the locale assumes > for its character set (unless the locale is "C", which doesn't care). What's a bad idea? Having a lc_collate on the cluster that doesn't support the encodings in the databases? > We'd enforce that you never do it if we knew a portable way to determine > the character set assumed by an LC_COLLATE setting. Again, not sure what "it" is, but I do find it confusing when the cluster can have only one lc_collate, but the databases on that cluster can have more than one encoding. That's why I was asking how postgresql handles (possibly) different encodings. Are you saying that if a database is encoded as utf8 then the cluster should be initiated with something like en_US.utf8? And then all databaes on that cluster should be encoded the same? I suspect I don't understand how LC_COLLATE works that well. I thought the locale defines the order of the characters, but not the encoding of those characters. Maybe that's not correct. I assumed the same locale should sort the same chars represented in different encodings the same way. Maybe that's not the case: $ LC_ALL=en_US.UTF-8 locale charmap UTF-8 $ LC_ALL=en_US locale charmap ISO-8859-1 $ LC_ALL=C locale charmap ANSI_X3.4-1968 -- Bill Moseley moseley@xxxxxxxx