Search Postgresql Archives

Re: PostgreSQL, UTF-8 and Mac OS X

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 07, 2005 at 02:28:05PM +0100, Guido Neitzer wrote:
> I think I was the one who asked.
> 
> I worked on my locale problem on the weekend and was able to build a  
> LC_COLLATE file, that actually works with ISO locales, but not with  
> UTF-8 (50% progress ... ;-)).

Guess the problem is that you have to import the entire Unicode
database to make it work. I think the code is multibyte aware though,
it's just that no-one has done the work.

Disclaimer: I'm working with Linux/Glibc which has had proper collation
for quite a while now so I have no real understanding of systems that
don't have it.

> When you test the UNIX utility "sort" on Mac OS X, you should be  
> aware, that the pre-installed version on Mac OS X ignores locales at  
> all ... :-( I had to install the gnu coreutils to get a sort that  
> works with locales, and this also fails on UTF-8 but works with ISO  
> encoding/collate - same as PG does.

Nasty.

> Now I'm not sure, whether my own LC_COLLATE file is not appropriate  
> for UTF-8 (why not?) or whether Mac OS X locale does not support  
> UTF-8 at all as you state.

Hmm, I just went back to the source code (adv_cmds-79.1) and it looks
like collations don't support UTF-8 at all. Or any multibyte encoding.

> Will be cool to have locale support directly in PostgreSQL.

Yeah, but seems a bit lame for an operating system to claim to support
multibyte locales if it can't do collation on them. :( It supports
everything but collation, so it's obviously not a priority.

> So, just a quick question regarding a switch: is there a problem with  
> using ISO8859-15 for now, and do a switch later with dumping the data  
> and import it to a newer version which should then use UTF-8? Do I  
> need to do some conversion or how does this work?

If you import as ISO8859-15 now, when you do the upgrade, simply set
the client encoding to that and PostgreSQL will convert it all to UTF-8
during the load.

Have a nice day,
-- 
Martijn van Oosterhout   <kleptog@xxxxxxxxx>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment: pgpAxC2KBide1.pgp
Description: PGP signature


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux