On Tue, 2023-02-14 at 12:17 +0100, Dominique Devienne wrote: > On Tue, Feb 14, 2023 at 11:23 AM Laurenz Albe <laurenz.albe@xxxxxxxxxxx> wrote: > > On Tue, 2023-02-14 at 10:31 +0100, Dominique Devienne wrote: > > > Surely sorting should be "constant left-to-right", no? What are we missing? > > > > No, it isn't. That's not how natural language collations work. > > Honestly, who expects the same prefix to sort differently based on what comes > after, in left-to-right languages? > How does one even find out what the (capricious?) rules for sorting in a given > collation are? Look at the documentation / implementation. As far as ICU is concerned, here: https://unicode.org/reports/tr10/ > > > I'm already surprised (star) comes before (space), when the latter "comes > > > before" the former in both ASCII and UTF-8, but that the two "Foo*" and "Foo " > > > prefixed pairs are not clustered after sorting is just mistifying to me. So how come? > > > > Because they compare identical on the first three levels. Any difference in > > letters, accents or case weighs stronger, even if it occurs to the right > > of these substrings. > > That's completely unintuitive... Well, you can complain to GNU and the Unicode consortium, but that's pretty much the way it is. > > Yes, it soulds like the "C" collation may be best for you. That is, if you don't > > mind that "Z" < "a". > > I would mind if I asked for case-insensitive comparisons. > > So the "C" collation is fine with general UTF-8 encoding? > I.e. it will be codepoint ordered OK? Yes, exactly. Yours, Laurenz Albe