Re: Replacing glibc langpacks

Florian Weimer <fweimer@xxxxxxxxxx> · Mon, 03 Jun 2019 16:06:45 +0200

* Zbigniew Jędrzejewski-Szmek:

> On Mon, Jun 03, 2019 at 02:59:13PM +0200, Florian Weimer wrote:
>> * Zbigniew Jędrzejewski-Szmek:
>> 
>> > On Mon, May 27, 2019 at 09:13:50PM +0200, Florian Weimer wrote:
>> >> * Tomasz Kłoczko:
>> >> 
>> >> > On Mon, 27 May 2019 at 10:41, Florian Weimer <fweimer@xxxxxxxxxx> wrote:
>> >> >> I'm investigating whether it makes sense to switch to a scheme where the
>> >> >> glibc locale data is built from source, during package installation,
>> >> >> based on the langpack configuration system.  This is similar to what
>> >> >> Debian does.
>> >> >>
>> >> >> The reason is that the compressed locale source code (without the
>> >> >> charmaps, which are not strictly needed once we patch localedef)
>> 
>> > Can you expand a bit on this part about patch?
>> 
>> localedef currently reads character conversion tables from charmap files
>> under /usr/share/i18n/charmaps.  The same information is contained in
>> the gconv modules unconditionally installed under /usr/lib*/gconv.
>> 
>> > Do I understand correctly, that the saving essentially comes from the fact
>> > that current glibc-langpack-en contains 14 localized variants (AU, BW, ZA,
>> > US, ...), and only a subset of those could be generated in your proposal?
>> > If so, would simply splitting glibc-langpack-en further into subpackages
>> > be an alternative? E.g. glibc-langpack-en-US, glibc-langpack-en-AU,
>> > ... ?
>> 
>> In theory, yes, but that would result in a few dozen more langpack
>> packages.
>> 
>> The other variance is the supported single-byte charset (UTF-8,
>> ISO-8859-1, ISO-8859-15).
>
> Hmm, so maybe that's the way to go: split each langpack into
> glibc-langpack-XX and glibc-langpack-XX-legacy. Not installing -legacy
> will halve the disk usage, no?

This will nearly double the number of langpack packages needed by glibc.
We also use hard links to share identical files across locales—compare
the output of “du -hcs /usr/lib/locale/en_*”, “du -hcsl
/usr/lib/locale/en_*”, “du -hcs /usr/lib/locale/en_US.utf8/” and finally
“du -hcs /usr/lib/locale/en_US{,.utf8}/”.

In short, there's 6.7 MiB today, 2.9 MiB for UTF-8 only, and 3.2 MiB for
UTF-8 and ISO-8859-1.  (I don't think skipping en_US is realistic.)

Thanks,
Florian
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx