On Sat, Sep 16, 2023 at 7:42 AM Tom Lane <tgl@xxxxxxxxxxxxx> wrote: > Sadly, this proves very little about Linux's behavior. glibc's idea > of en_US involves some very complicated multi-pass sort rules. > AFAICT from the FreeBSD sort(1) man page, FreeBSD defines en_US > as "same as C except case-insensitive", whereas I'm pretty sure > that underscores and other punctuation are nearly ignored in > glibc's interpretation; they'll only be taken into account if the > alphanumeric parts of the strings sort equal. Achilleas didn't mention the glibc version, but based on the kernel vintage mentioned I guess that must be the "old" (pre 2.28) glibc sorting. In 2.28 they did a big sync-up with ISO 14651, while FreeBSD follows the UCA, a closely related standard[1]. I think newer Linux/glibc systems should agree with FreeBSD's libc in more cases (and also agree with ICU). [1] https://unicode.org/reports/tr10/#Synch_ISO14651