Search Postgresql Archives

Re: case insensitive collation of Greek's sigma

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 26.11.21 um 08:37 schrieb Jakub Jedelsky:
Hello,

during our tests of Postgres with ICU we found an issue with ILIKE of upper and lowercase sigma (Σ). The letter has two lowercase variants σ and ς (at the end of a word). I'm working with en_US and en-US-x-icu collations and results are a bit unexpected - they are inverted:

postgres=# SELECT
postgres-# 'ΣΣ' ILIKE 'σσ' COLLATE "en_US",
postgres-# 'ΣΣ' ILIKE 'σς' COLLATE "en_US"
postgres-# ;
 ?column? | ?column?
----------+----------
 t        | f
(1 row)

postgres=# SELECT
postgres-# 'ΣΣ' ILIKE 'σσ' COLLATE "en-US-x-icu",
postgres-# 'ΣΣ' ILIKE 'σς' COLLATE "en-US-x-icu";
 ?column? | ?column?
----------+----------
 f        | t
(1 row)

I run those commands on the latest (14.1) official docker image.

Is it possible to unify the behaviour?And which one is correct from the community point of view?

If I could start, I think both results are wrong as both should return True. If I got it right, in the background there is a lower() function running to compare strings, which is not enough for such cases (until the left side isn't taken as a standalone word).

Thanks,

- jj

Have you seen the subtle intricacies in this example?

=> SELECT 'ΣΣ Μ' ILIKE 'σσ Μ' COLLATE "en_US" AS c0,
          'ΣΣ Μ' ILIKE 'σς Μ' COLLATE "en_US" AS c1,
          'ΣΣ Μ' ~*    'σσ Μ' COLLATE "el-GR-x-icu" AS c2,
          'ΣΣ Μ' ~*    'σς Μ' COLLATE "el-GR-x-icu" AS c3,
          'ΣΣ Μ' ILIKE 'σσ Μ' COLLATE "el-GR-x-icu" AS c4,
          'ΣΣ Μ' ILIKE 'σς Μ' COLLATE "el-GR-x-icu" AS c5,
          'ΣΣ Μ' ~*    'σσ Μ' COLLATE "en-US-x-icu" AS c6,
          'ΣΣ Μ' ~*    'σς Μ' COLLATE "en-US-x-icu" AS c7,
          'ΣΣ Μ' ILIKE 'σσ Μ' COLLATE "en-US-x-icu" AS c8,
          'ΣΣ Μ' ILIKE 'σς Μ' COLLATE "en-US-x-icu" AS c9;
 c0 | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 | c9
----+----+----+----+----+----+----+----+----+----
 t  | f  | t  | t  | f  | t  | t  | t  | f  | t
(1 row)

Obviously, the ILIKE operator is really strict regarding to the correct letter at the end of the word. The regular expression operator works as you expected.

Happy computing...
Frank





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux