Search Postgresql Archives

Re: case insensitive collation of Greek's sigma

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 26.11.21 08:37, Jakub Jedelsky wrote:
postgres=# SELECT
postgres-# 'ΣΣ' ILIKE 'σσ' COLLATE "en_US",
postgres-# 'ΣΣ' ILIKE 'σς' COLLATE "en_US"
postgres-# ;
  ?column? | ?column?
----------+----------
  t        | f
(1 row)

postgres=# SELECT
postgres-# 'ΣΣ' ILIKE 'σσ' COLLATE "en-US-x-icu",
postgres-# 'ΣΣ' ILIKE 'σς' COLLATE "en-US-x-icu";
  ?column? | ?column?
----------+----------
  f        | t
(1 row)

If I could start, I think both results are wrong as both should return True. If I got it right, in the background there is a lower() function running to compare strings, which is not enough for such cases (until the left side isn't taken as a standalone word).

The reason for these results is that for multibyte encodings, a ILIKE b basically does lower(a) LIKE lower(b), and

select lower('ΣΣ' COLLATE "en_US"), lower('ΣΣ' COLLATE "en-US-x-icu");
 lower | lower
-------+-------
 σσ    | σς

Running lower() like this is really the wrong thing to do. We should be doing "case folding" instead, which normalizes these differences for the purpose of case-insensitive comparisons.





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux