Search Postgresql Archives

Re: Mixing different LC_COLLATE and database encodings

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Sat, Feb 18, 2006 at 08:16:07PM -0800, Bill Moseley wrote:
> > Is the Holy Grail encoding and lc_collate settings per column?
> 
> Well yes. I've been trying to create a system where you can handle
> multiple collations in the same database. I posted the details to
> -hackers and got part of the way, but it's a lot of work.
> 
> As for encodings, to be honest, I'm not sure whether it's a great idea
> to support multiple encodings simultaneously. Things become a lot
> easier if you know everything is the same encoding. If you set the
> client_encoding automatically on startup it has pretty much the same
> effect as having the server always use that encoding. It's just a bit
> of time wasted in conversion, but the client doesn't need to care.
> 
> By way of example, see ICU which is an internationalisation library
> we're considering to get consistant locale support over all platforms.
> It supports one encoding, namely UTF-16. It has various functions to
> convert other encodings to or from that, but internally it's all
> UTF-16. So if we do use that, then all encodings (except native UTF-16)
> will need to conversion all the time, so you don't buy anything by
> having the server in some random encoding.
> 
> The problem ofcourse being that the SQL standard requires some encoding
> support. No-one has really come up with a proposal for that yet. IMHO,
> that's a parser issue more than anything else.

If you consider to allow only UTF-16 or whatever encoding in backend,
I will strongly against the idea. We Japanese need those encodings
native support. Converting those encodings with Unicode everytime when
backend and forntend have conversations will be serious performance
hit. Moreover the converion is known as not being roundtrip safe, that
means some information will be lost during the conversion. The another
point would be on disk format. UTF-16 will require more storage than
local encodings. Probably UTF-8 will require more.

I have a feeling that ICU is good for applications, but is not for
DBMSs.
--
Tatsuo Ishii
SRA OSS, Inc. Japan


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux