On Wed, Feb 20, 2019 at 1:42 PM Bob Jolliffe <bobjolliffe@xxxxxxxxx> wrote: > It seems not to be (completely) particular to the installation. > Testing on different platforms we found variable speed difference > between 100x and 1000x slower, but always a considerable order of > magnitiude. The very slow performance comes from sorting Lao > characters using en_US.UTF-8 collation. I knew that some collations were slower, generally for reasons that make some sense. For example, I was aware that ICU's use of Japanese standard JIS X 4061 is particularly complicated and expensive, but produces the most useful possible result from the point of view of a Japanese speaker. Apparently glibc does not use that algorithm, and so offers less useful sort order (though it may actually be faster in that particular case). I suspect that the reasons why the Lao locale sorts so much slower may also have something to do with the intrinsic cost of supporting more complicated rules. However, it's such a ridiculously large difference that it also seems likely that somebody was disinclined to go to the effort of optimizing it. The ICU people found that to be a tractable goal, but they may have had to work at it. I also have a vague notion that there are special cases that are more or less only useful for sorting French. These complicate the implementation of UCA style algorithms. I am only speculating, based on what I've heard about other cases -- perhaps this explanation is totally wrong. I know a lot more about this stuff than most people on this mailing list, but I'm still far from being an expert. -- Peter Geoghegan