Re: Losing my latin on Ordering...

Alvaro Herrera <alvherre@xxxxxxxxxxxxxx> · Tue, 14 Feb 2023 12:35:16 +0100

On 2023-Feb-14, Dominique Devienne wrote:

> Honestly, who expects the same prefix to sort differently based on what
> comes after, in left-to-right languages?

Look, we don't define the collation rules.  We just grab the collation
rules defined by experts in collations.  In this case the experts have
advised the glibc developers to write collations this way; but even if
you went further and looked at the ICU libraries, you would find that
they have pretty much the same definition.

> How does one even find out what the (capricious?) rules for sorting in a
> given collation are?

I suggest to look at a telephone book carefully sometime (provided you
can find one ... apparently nobody wants them anymore.)

> So the "C" collation is fine with general UTF-8 encoding?
> I.e. it will be codepoint ordered OK?

Sure, just make sure to use the definition of C that uses UTF-8 encoding
(I think it's typically called C.UTF-8).

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/