Re: Encoding, Unicode, locales, etc.

Karsten Hilbert <Karsten.Hilbert@xxxxxxx> · Wed, 1 Nov 2006 23:52:06 +0100



On Wed, Nov 01, 2006 at 08:50:30PM +0100, Martijn van Oosterhout wrote:

> > Could this paragraph be put into the docs and/or the FAQ,
> > please ? Along with the recommendation that if you require
> > multiple encodings for your databases you better had your OS
> > locale configured properly for UTF8 and use UNICODE
> > databases or do initdb with the C-locale.
> 
> Err, multiple encodings don't work full-stop.
Well, yes, I was thinking of multiple client encodings which
can be supported either via a C-locale-initdb with the
databases set to the encoding you require (but sorting/etc
won't work, I know) or by doing a unicode-initdb and using
unicode databases. In each case the client encodings can be
"multiple" ones - as long as conversion is possible. Sorting
etc may still be wrong, but at least the proper characters
are going in and coming back.

> Any particular locale (as
> defined by POSIX) is only really designed to work with one encoding.
Sure. What I meant is that if you have a unicode database
you can use several client encodings and get back the
properly encoded characters.

> The fact that the C locale produces an order when sorting UTF8 text is
> really just luck.
Yes.

> > Here are a few data points from my Debian/Testing system in
> > favour of not worrying too much about installed ICU size as
> > it is being used by other packages anyways:
> 
> We'd need a suitable patch first before we start worrying about that. I
> think diskspace is less of an issue now.
Well, size did come up in a "recent" discussion so I thought
I'd mention the above facts.

Karsten
-- 
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346