Search Postgresql Archives

Re: utf-8 and cultural sensitive sorting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> It depends what language you want to sort. Lots of languages do not  
> have a sort alphabet. For example, Japanese. It can be quite  
> difficult to sort unusual languages like this. I am not aware of any  
> standard technique for sorting Japanese text other than keeping an  
> arbitrarily sorted dictionary (courtesy of whatever the most popular  
> Japanese dictionary at the time happens to be perhaps) and then doing  
> hash lookups in the for indexing values. As you can imagine, this is  
> not particularly fast. I have not actually tried this, but I expect  
> PosgreSQL will simply sort in a fairly binary fashion. As in, it gets  
> sorted in according to the binary value of the characters, or the  
> UTF-8 offsets, or something like that.

Above is almost correct but usually sorting by the JIS code order is
enough for most Japanese applications (I believe same thing can be
said to Chinese). I do not recommend using locale for sorting
Japanese. It quite frequently happens that the locale support for
multibyte encodings is totally broken. See recent posting titled
" Japanese words not distinguished" for more details.

If you have to live with UTF-8 database, I recommend turning off the
locale support and use CONVERT to sort Japanese. For example,

SELECT * FROM t1 ORDER BY CONVERT(col1 USING utf_8_to_euc_jp);

> On 12 Jul 2005, at 15:48, <sknipe@xxxxxxxxxx> <sknipe@xxxxxxxxxx> wrote:
> 
> > Our product will be storing its character data in utf-8 format  
> > (unicode encoding).
> >
> > What is the best way to achive cultural sensitive sorting using the  
> > utf-8 data?
> >
> > Is it possible have the locale apply to a connection?
> >
> > If so, is the cultural sorting support mature in PostgreSQL?
> >
> > What type of performance can be expected as compared with the  
> > normal c locale sorting?
> >
> > Thanks very much,
> >
> > Steve.
> >
> > ---------------------------(end of  
> > broadcast)---------------------------
> > TIP 1: if posting/reading through Usenet, please send an appropriate
> >       subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that  
> > your
> >       message can get through to the mailing list cleanly
> >
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
> 

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux