On Sun, Feb 5, 2023 at 4:19 PM Tom Lane <tgl@xxxxxxxxxxxxx> wrote: > If there's a predominant language in the data, selecting a collation > matching that seems like your best bet. Otherwise, maybe you should > just shrug your shoulders and stick with C collation. It's likely > to be faster than any alternative. FWIW there are certain "compromise locales" supported by ICU/CLDR. These include "English (Europe)", and, most notably, EOR (European Ordering Rules): https://en.wikipedia.org/wiki/European_ordering_rules I'm not sure how widely used those are. EOR seems to have been standardized by the EU or by an adjacent institution, so not sure how widely used it really is. It's also possible to use a custom collation with ICU, which is almost infinitely flexible: http://www.unicode.org/reports/tr10/#Customization As an example, the rules about the relative ordering of each script can be changed this way. There is also something called merged tailorings. The OP should see the Postgres ICU docs for hints on how to use these facilities to make a custom collation that matches whatever their requirements are: https://www.postgresql.org/docs/current/collation.html#COLLATION-MANAGING -- Peter Geoghegan