On 2023-Feb-14, Dominique Devienne wrote: > Honestly, who expects the same prefix to sort differently based on what > comes after, in left-to-right languages? Look, we don't define the collation rules. We just grab the collation rules defined by experts in collations. In this case the experts have advised the glibc developers to write collations this way; but even if you went further and looked at the ICU libraries, you would find that they have pretty much the same definition. > How does one even find out what the (capricious?) rules for sorting in a > given collation are? I suggest to look at a telephone book carefully sometime (provided you can find one ... apparently nobody wants them anymore.) > So the "C" collation is fine with general UTF-8 encoding? > I.e. it will be codepoint ordered OK? Sure, just make sure to use the definition of C that uses UTF-8 encoding (I think it's typically called C.UTF-8). -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/