Search Postgresql Archives

Re: SQL Error [0A000]: ERROR: nondeterministic collations are not supported for LIKE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Daniel, thanks for the reply.  I believe we will just remove the collation, allow LIKE to function normally, and wait for a future patch is one is ever provided.

On Thu, Oct 24, 2019 at 3:39 PM Daniel Verite <daniel@xxxxxxxxxxxxxxxx> wrote:
        Jeff Lanzarotta wrote:

> I have a question about nondeterministic collations in PostgreSQL 12.  I
> have created a new collation that is nondeterministic and created several
> columns which use this collation.  Querying these columns works great until
> I use LIKE.  When I do, I get the following error:
>
> SQL Error [0A000]: ERROR: nondeterministic collations are not supported for
> LIKE
>
> Is there any plan to allow this functionality?

PostgreSQL development is conducted without a roadmap [1]. Maybe
someone will submit a patch to enable LIKE with nondeterministic
collations, but so far it did not happen according to the current set
of patches at https://commitfest.postgresql.org

Such matches can be weirder than you might think (not to
mention much slower).
Consider for instance a collation that ignores punctuation:

CREATE COLLATION "nd3alt" (
  provider = 'icu',
  locale='und@colAlternate=shifted',
  deterministic = false
);

In the icu_ext extension, icu_strpos [2] can match a substring with
a nondeterministic collation, which is one part of what LIKE
would need to do for such collations. The implementation uses
the string search facility of the ICU library.

With the above-defined collation, we can have for instance:

SELECT icu_strpos('abc.  ...de', 'c,d' COLLATE nd3alt);
 icu_strpos
------------
          3

So even though 'c,d' is not a substring of 'abc.  ...de' in the common
sense, it is recognized as such by this collation, by design.

A LIKE operator for nondeterministic collations should be able to
recognize this too, but with an arbitrary number of substrings to
match in the pattern, plus it should handle the underscore wildcard
in a way that hopefully makes sense.

With the example above,
   'abc.  ...de' LIKE '%c,d%' COLLATE nd3alt
should certainly be a match, but in the case of this variant:
  'abc.  ...de' LIKE '%c_d%' COLLATE nd3alt
it's not necessarily clear how (or even if) it should work.


[1] https://www.postgresql.org/developer/roadmap/
[2] https://github.com/dverite/icu_ext#icu_strpos


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux