Re: F41 Change Proposal: Change Compose Settings (system-wide)

Mattia Verga via devel <devel@xxxxxxxxxxxxxxxxxxxxxxx> · Tue, 26 Mar 2024 09:55:07 +0000

Il 26/03/24 10:41, Sirius via devel ha scritto:
> In days of yore (Tue, 26 Mar 2024), fedora-devel thus quoth:
>> In days of yore (Thu, 21 Mar 2024), Stephen Smoogen thus quoth:
>>> On Wed, 20 Mar 2024 at 22:01, Kevin Kofler via devel <
>>> devel@xxxxxxxxxxxxxxxxxxxxxxx> wrote:
>>>
>>>> Aoife Moloney wrote:
>>>>> The zstd compression type was chosen to match createrepo_c settings.
>>>>> As an alternative, we might want to choose xz,
>>>> Since xz consistently compresses better than zstd, I would strongly
>>>> suggest
>>>> using xz everywhere to minimize download sizes. However:
>>>>
>>>>> especially after zlib-ng has been made the default in Fedora and brought
>>>>> performance improvements.
>>>> zlib-ng is for gz, not xz, and gz is fast, but compresses extremely poorly
>>>> (which is mostly due to the format, so, while some implementations manage
>>>> to
>>>> do better than others at the expense of more compression time, there is a
>>>> limit to how well they can do and it is nowhere near xz or even zstd) and
>>>> should hence never be used at all.
>>>>
>>>>
>>> There are two parts to this which users will see as 'slowness'. Part one is
>>> downloading the data from a mirror. Part two is uncompressing the data. In
>>> work I have been a part of, we have found that while xz gave us much
>>> smaller files, the time to uncompress was so much larger that our download
>>> gains were lost. Using zstd gave larger downloads (maybe 10 to 20% bigger)
>>> but uncompressed much faster than xz. This is data dependent though so it
>>> would be good to see if someone could test to see if xz uncompression of
>>> the datafiles will be too slow.
>> Hi there,
>>
>> Ran tests with gzip 1-9 and xz 1-9 on a F41 XML file that was 940MiB.
> Added tests with zstd 1-19, not using a dictionary to improve it any
> further.
>
> Input File: f41-filelist.xml, Size: 985194446 bytes
>
> ZStd Level  1,     1.7s to compress, 6.46% file size,  0.6s decompress
> ZStd Level  2,     1.7s to compress, 6.34% file size,  0.7s decompress
> ZStd Level  3,     2.1s to compress, 6.26% file size,  0.7s decompress
> ZStd Level  4,     2.3s to compress, 6.26% file size,  0.7s decompress
> ZStd Level  5,     5.7s to compress, 5.60% file size,  0.6s decompress
> ZStd Level  6,     7.2s to compress, 5.42% file size,  0.6s decompress
> ZStd Level  7,     8.1s to compress, 5.39% file size,  0.6s decompress
> ZStd Level  8,     9.5s to compress, 5.31% file size,  0.6s decompress
> ZStd Level  9,    10.4s to compress, 5.28% file size,  0.6s decompress
> ZStd Level 10,    13.6s to compress, 5.26% file size,  0.6s decompress
> ZStd Level 11,    18.4s to compress, 5.25% file size,  0.6s decompress
> ZStd Level 12,    19.5s to compress, 5.25% file size,  0.6s decompress
> ZStd Level 13,    30.9s to compress, 5.25% file size,  0.6s decompress
> ZStd Level 14,    39.7s to compress, 5.23% file size,  0.6s decompress
> ZStd Level 15,    56.1s to compress, 5.21% file size,  0.6s decompress
> ZStd Level 16,  1min58s to compress, 5.52% file size,  0.7s decompress
> ZStd Level 17,  2min25s to compress, 5.36% file size,  0.7s decompress
> ZStd Level 18,  3min46s to compress, 5.43% file size,  0.8s decompress
> ZStd Level 19, 10min36s to compress, 4.66% file size,  0.7s decompress
>
> So to save 5.2MB in filesize (lvl19 vs lvl15) the server have to spend
> eleven times longer compressing the file (and I did not look at resources
> like CPU or RAM while doing this). I am sure there are other compression
> mechanisms that can squeeze these files a bit further, but at what cost.
> If it is a once a day event, maybe a high compression ration is
> justifiable. If it has to happen hundreds of times per day - not so much.
>
>
> ## zstd
> function do_zstd()
> {
>    let cl=1
>    echo Input File: ${INPUTFILE}, Size: ${INPUTFILESIZE} bytes
>    echo
>    while [[ $cl -le 19 ]]
>    do
>      echo ZStd compression level ${cl}
>      echo Time to compress the file
>      time zstd -z -${cl} ${INPUTFILE}
>      COMPRESSED_SIZE=$(ls -ln ${INPUTFILE}.zst | awk '{print $5}')
>      echo Compressed to
>      echo "scale=5
>      ${COMPRESSED_SIZE}/${INPUTFILESIZE}*100
>      "|bc
>      echo % of original
>      echo Time to decompress the file, output to /dev/null
>      time zstd -d -c ${INPUTFILE}.zst > /dev/null
>      rm -f ${INPUTFILE}.zst
>      let cl=$cl+1
>      echo
>    done
> }
>
> --
> Kind regards,
>
> /S
> --

Also note that adding '-T0' to use all available cores of the CPU will 
greatly speed up the results with zstd.

However, all this talking about the optimal compression level, but in 
the end there's no way to set that to createrepo_c options, so.... ;-)

Mattia

--
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue