Martijn van Oosterhout <kleptog@xxxxxxxxx> writes: > On Sat, Feb 18, 2006 at 08:16:07PM -0800, Bill Moseley wrote: > > Is the Holy Grail encoding and lc_collate settings per column? > > By way of example, see ICU which is an internationalisation library > we're considering to get consistant locale support over all platforms. > It supports one encoding, namely UTF-16. It has various functions to > convert other encodings to or from that, but internally it's all > UTF-16. So if we do use that, then all encodings (except native UTF-16) > will need to conversion all the time, so you don't buy anything by > having the server in some random encoding. Ugh. At least from my perspective that makes it a non-starter. As I'm sure you realize storage density is a major factor, often the dominant factor, in database performance. Anything that would double the storage size for ascii foreign keys is going to be a terrible hit. And having to do a ascii->utf-16 conversion for every foreign key constraint check would be nearly as bad. I know it's a simple conversion but compared to a simple strcmp in a critical code path it's going to increase cpu usage significantly. I'm still unclear what advantage adding yet another external library dependency gains Postgres in this area. The bulk of the difficulties seem to be on the user interface side where it's unclear how to let users control this functionality. It seems like the actual mechanics of sorting in various locales can be handled using standard libc i18n functions. The one issue people have raised is that traditional libc functions require switching a global state between locales and not all implementations support that well. But depending on a single non-standard extension seems better than depending on a huge external library. Especially when the consequences of that non-standard extension being missing is only that performance will suffer in a case Postgres currently doesn't handle at all. -- greg