* Panu Matilainen: > On 11/06/2018 02:13 PM, Mike FABIAN wrote: >> Panu Matilainen <pmatilai@xxxxxxxxxx> さんはかきました: >> >>> On 11/06/2018 12:15 PM, Zbigniew Jędrzejewski-Szmek wrote: >>>> On Tue, Nov 06, 2018 at 12:10:04PM +0200, Panu Matilainen wrote: >>>>> On 11/06/2018 03:05 AM, Kevin Kofler wrote: >>>>>> Zbigniew Jędrzejewski-Szmek wrote: >>>>>>> The first step is to replace LC_ALL=en_US.UTF-8 with LC_ALL=C.UTF-8 >>>>>>> (and similarly for LANG=, LC_CTYPE=, etc.) in all spec files. >>>>>> >>>>>> But there are probably many more packages where the setting is hidden in >>>>>> upstream build scripts. >>>>> >>>>> Build- and various other scripts. >>>>> >>>>> Is C.UTF-8 glibc upstream now, or is it still Fedora-specific? >>>> >>>> It was never Fedora-specific. The original justification in 2013 or so >>>> was "other distros already do it". It's just glibc upstream that doesn't >>>> have it. >>>> >>>> We still carry >>>> https://src.fedoraproject.org/rpms/glibc/blob/master/f/glibc-c-utf8-locale.patch, >>>> so it seems this hasn't been upstream. >>> >>> Ugh, this is a rather cumbersome situation for other projects: >>> supporting and using C.UTF-8 isn't going to happen large scale until >>> it's upstreamed. And it does make one wonder what exactly is >>> preventing it from being upstreamed in glibc. >> >> The current C.UTF-8 locale doesn’t sort correctly. It should sort >> according to code point order, but it does that only partly. It is sort >> of a quick hack. The glibc developers are working on a better solution >> but this takes more time. >> > > Hmm. Not sorting correctly doesn't sound so good when LANG=C (and now > C.UTF-8) is quite commonly used exactly for that purpose. Not all looks fixable to me in the current setting. We expose the table layout via nl_langinfo, so that's part of the ABI, and the tables just cannot express the sorting order with less than three to four bytes per codepoint. That's a lot of data even if we restrict ourselves to the modern UTF-8 range (those codepoints addressable using UTF-16 surrogate pairs). I think we could generate the tables on the fly if they are ever requested using nl_langinfo. Not many applications seem to do that. Internally within glibc, we could use a different interface to avoid the table generation. The table layout also has significant problems with expressing proper collation tables. We need to investigate this more deeply, but my impression is that the collation and collation sequence tables constitute a significant fraction of the locale data on disk. Changing the table layout again has ABI implications there, similar to those for C.UTF-8, except that the on-the-fly conversation code will be more difficult to write. Thanks, Florian _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx