Re: unexpected (to me) sorting order

Glyn Astill <glynastill@xxxxxxxxxxx> · Thu, 9 Apr 2015 12:43:56 +0000 (UTC)

> From: Scott Marlowe <scott.marlowe@xxxxxxxxx>
 > To: Glyn Astill <glynastill@xxxxxxxxxxx>
 > Cc: Björn Lundin <b.f.lundin@xxxxxxxxx>; "pgsql-general@xxxxxxxxxxxxxx" <pgsql-general@xxxxxxxxxxxxxx>
 > Sent: Thursday, 9 April 2015, 13:23
 > Subject: Re:  unexpected (to me) sorting order
 > 
> On Wed, Apr 8, 2015 at 3:33 AM, Glyn Astill <glynastill@xxxxxxxxxxx> 
> wrote:
> 
>> 
>>>  From: Björn Lundin <b.f.lundin@xxxxxxxxx>
>>> To: pgsql-general@xxxxxxxxxxxxxx
>>> Sent: Wednesday, 8 April 2015, 10:09
>>> Subject: [GENERAL] unexpected (to me) sorting order
>>> 
>>> select * from T_SORT order by NAME ;
>>> 
>>> rollback;
>>>  id |        name
>>> ----+--------------------
>>>   1 | FINISH_110_150_1
>>>   2 | FINISH_110_200_1
>>>   3 | FINISH_1.10_20.0_3
>>>   4 | FINISH_1.10_20.0_4
>>>   5 | FINISH_1.10_30.0_3
>>>   6 | FINISH_1.10_30.0_4
>>>   7 | FINISH_120_150_1
>>>   8 | FINISH_120_200_1
>>> (8 rows)
>>> 
>>> why is FINISH_1.10_20.0_3 between
>>>  FINISH_110_200_1 and
>>>  FINISH_120_150_1
>>> ?
>>> 
>>> That is why is '.' between 1 and 2 as in 110/120 ?
>>> 
>>> 
>>> pg_admin III reports the database is created like
>>> CREATE DATABASE bnl
>>>   WITH OWNER = bnl
>>>        ENCODING = 'UTF8'
>>>        TABLESPACE = pg_default
>>>        LC_COLLATE = 'en_US.UTF-8'
>>>        LC_CTYPE = 'en_US.UTF-8'
>>>        CONNECTION LIMIT = -1;
>>> 
>>> 
>> 
>> 
>> 
>>  The collation of your "bnl" database is utf8, so the 
> "." punctuation character is seen as a "variable element" 
> and given a lower weighting in the sort to the rest of the characters.  
> That's just how the collate algorithm works in UTF8.
> 
> utf8 is an encoding method, not a collation. The collation is en_US,
> encoded in utf8. You can use C collation with utf8 encoding just fine.
> So just replace UTF8 with en_US in your sentence and you've got it
> right.
> 

Yes, thanks for the correction there, and we're talking about the wider unicode collate algorithm.