In days of yore (Tue, 26 Mar 2024), fedora-devel thus quoth: > In days of yore (Thu, 21 Mar 2024), Stephen Smoogen thus quoth: > > On Wed, 20 Mar 2024 at 22:01, Kevin Kofler via devel < > > devel@xxxxxxxxxxxxxxxxxxxxxxx> wrote: > > > > > Aoife Moloney wrote: > > > > The zstd compression type was chosen to match createrepo_c settings. > > > > As an alternative, we might want to choose xz, > > > > > > Since xz consistently compresses better than zstd, I would strongly > > > suggest > > > using xz everywhere to minimize download sizes. However: > > > > > > > especially after zlib-ng has been made the default in Fedora and brought > > > > performance improvements. > > > > > > zlib-ng is for gz, not xz, and gz is fast, but compresses extremely poorly > > > (which is mostly due to the format, so, while some implementations manage > > > to > > > do better than others at the expense of more compression time, there is a > > > limit to how well they can do and it is nowhere near xz or even zstd) and > > > should hence never be used at all. > > > > > > > > There are two parts to this which users will see as 'slowness'. Part one is > > downloading the data from a mirror. Part two is uncompressing the data. In > > work I have been a part of, we have found that while xz gave us much > > smaller files, the time to uncompress was so much larger that our download > > gains were lost. Using zstd gave larger downloads (maybe 10 to 20% bigger) > > but uncompressed much faster than xz. This is data dependent though so it > > would be good to see if someone could test to see if xz uncompression of > > the datafiles will be too slow. > > Hi there, > > Ran tests with gzip 1-9 and xz 1-9 on a F41 XML file that was 940MiB. Added tests with zstd 1-19, not using a dictionary to improve it any further. Input File: f41-filelist.xml, Size: 985194446 bytes ZStd Level 1, 1.7s to compress, 6.46% file size, 0.6s decompress ZStd Level 2, 1.7s to compress, 6.34% file size, 0.7s decompress ZStd Level 3, 2.1s to compress, 6.26% file size, 0.7s decompress ZStd Level 4, 2.3s to compress, 6.26% file size, 0.7s decompress ZStd Level 5, 5.7s to compress, 5.60% file size, 0.6s decompress ZStd Level 6, 7.2s to compress, 5.42% file size, 0.6s decompress ZStd Level 7, 8.1s to compress, 5.39% file size, 0.6s decompress ZStd Level 8, 9.5s to compress, 5.31% file size, 0.6s decompress ZStd Level 9, 10.4s to compress, 5.28% file size, 0.6s decompress ZStd Level 10, 13.6s to compress, 5.26% file size, 0.6s decompress ZStd Level 11, 18.4s to compress, 5.25% file size, 0.6s decompress ZStd Level 12, 19.5s to compress, 5.25% file size, 0.6s decompress ZStd Level 13, 30.9s to compress, 5.25% file size, 0.6s decompress ZStd Level 14, 39.7s to compress, 5.23% file size, 0.6s decompress ZStd Level 15, 56.1s to compress, 5.21% file size, 0.6s decompress ZStd Level 16, 1min58s to compress, 5.52% file size, 0.7s decompress ZStd Level 17, 2min25s to compress, 5.36% file size, 0.7s decompress ZStd Level 18, 3min46s to compress, 5.43% file size, 0.8s decompress ZStd Level 19, 10min36s to compress, 4.66% file size, 0.7s decompress So to save 5.2MB in filesize (lvl19 vs lvl15) the server have to spend eleven times longer compressing the file (and I did not look at resources like CPU or RAM while doing this). I am sure there are other compression mechanisms that can squeeze these files a bit further, but at what cost. If it is a once a day event, maybe a high compression ration is justifiable. If it has to happen hundreds of times per day - not so much. ## zstd function do_zstd() { let cl=1 echo Input File: ${INPUTFILE}, Size: ${INPUTFILESIZE} bytes echo while [[ $cl -le 19 ]] do echo ZStd compression level ${cl} echo Time to compress the file time zstd -z -${cl} ${INPUTFILE} COMPRESSED_SIZE=$(ls -ln ${INPUTFILE}.zst | awk '{print $5}') echo Compressed to echo "scale=5 ${COMPRESSED_SIZE}/${INPUTFILESIZE}*100 "|bc echo % of original echo Time to decompress the file, output to /dev/null time zstd -d -c ${INPUTFILE}.zst > /dev/null rm -f ${INPUTFILE}.zst let cl=$cl+1 echo done } -- Kind regards, /S -- _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue