Re: F41 Change Proposal: Change Compose Settings (system-wide)

Sirius via devel <devel@xxxxxxxxxxxxxxxxxxxxxxx> · Tue, 26 Mar 2024 10:41:44 +0100

In days of yore (Tue, 26 Mar 2024), fedora-devel thus quoth: 
> In days of yore (Thu, 21 Mar 2024), Stephen Smoogen thus quoth: 
> > On Wed, 20 Mar 2024 at 22:01, Kevin Kofler via devel <
> > devel@xxxxxxxxxxxxxxxxxxxxxxx> wrote:
> > 
> > > Aoife Moloney wrote:
> > > > The zstd compression type was chosen to match createrepo_c settings.
> > > > As an alternative, we might want to choose xz,
> > >
> > > Since xz consistently compresses better than zstd, I would strongly
> > > suggest
> > > using xz everywhere to minimize download sizes. However:
> > >
> > > > especially after zlib-ng has been made the default in Fedora and brought
> > > > performance improvements.
> > >
> > > zlib-ng is for gz, not xz, and gz is fast, but compresses extremely poorly
> > > (which is mostly due to the format, so, while some implementations manage
> > > to
> > > do better than others at the expense of more compression time, there is a
> > > limit to how well they can do and it is nowhere near xz or even zstd) and
> > > should hence never be used at all.
> > >
> > >
> > There are two parts to this which users will see as 'slowness'. Part one is
> > downloading the data from a mirror. Part two is uncompressing the data. In
> > work I have been a part of, we have found that while xz gave us much
> > smaller files, the time to uncompress was so much larger that our download
> > gains were lost. Using zstd gave larger downloads (maybe 10 to 20% bigger)
> > but uncompressed much faster than xz. This is data dependent though so it
> > would be good to see if someone could test to see if xz uncompression of
> > the datafiles will be too slow.
> 
> Hi there,
> 
> Ran tests with gzip 1-9 and xz 1-9 on a F41 XML file that was 940MiB.

Added tests with zstd 1-19, not using a dictionary to improve it any
further.

Input File: f41-filelist.xml, Size: 985194446 bytes

ZStd Level  1,     1.7s to compress, 6.46% file size,  0.6s decompress
ZStd Level  2,     1.7s to compress, 6.34% file size,  0.7s decompress
ZStd Level  3,     2.1s to compress, 6.26% file size,  0.7s decompress
ZStd Level  4,     2.3s to compress, 6.26% file size,  0.7s decompress
ZStd Level  5,     5.7s to compress, 5.60% file size,  0.6s decompress
ZStd Level  6,     7.2s to compress, 5.42% file size,  0.6s decompress
ZStd Level  7,     8.1s to compress, 5.39% file size,  0.6s decompress
ZStd Level  8,     9.5s to compress, 5.31% file size,  0.6s decompress
ZStd Level  9,    10.4s to compress, 5.28% file size,  0.6s decompress
ZStd Level 10,    13.6s to compress, 5.26% file size,  0.6s decompress
ZStd Level 11,    18.4s to compress, 5.25% file size,  0.6s decompress
ZStd Level 12,    19.5s to compress, 5.25% file size,  0.6s decompress
ZStd Level 13,    30.9s to compress, 5.25% file size,  0.6s decompress
ZStd Level 14,    39.7s to compress, 5.23% file size,  0.6s decompress
ZStd Level 15,    56.1s to compress, 5.21% file size,  0.6s decompress
ZStd Level 16,  1min58s to compress, 5.52% file size,  0.7s decompress
ZStd Level 17,  2min25s to compress, 5.36% file size,  0.7s decompress
ZStd Level 18,  3min46s to compress, 5.43% file size,  0.8s decompress
ZStd Level 19, 10min36s to compress, 4.66% file size,  0.7s decompress

So to save 5.2MB in filesize (lvl19 vs lvl15) the server have to spend
eleven times longer compressing the file (and I did not look at resources
like CPU or RAM while doing this). I am sure there are other compression
mechanisms that can squeeze these files a bit further, but at what cost.
If it is a once a day event, maybe a high compression ration is
justifiable. If it has to happen hundreds of times per day - not so much.

## zstd
function do_zstd()
{
  let cl=1
  echo Input File: ${INPUTFILE}, Size: ${INPUTFILESIZE} bytes
  echo
  while [[ $cl -le 19 ]]
  do
    echo ZStd compression level ${cl}
    echo Time to compress the file
    time zstd -z -${cl} ${INPUTFILE}
    COMPRESSED_SIZE=$(ls -ln ${INPUTFILE}.zst | awk '{print $5}')
    echo Compressed to
    echo "scale=5
    ${COMPRESSED_SIZE}/${INPUTFILESIZE}*100
    "|bc
    echo % of original
    echo Time to decompress the file, output to /dev/null
    time zstd -d -c ${INPUTFILE}.zst > /dev/null
    rm -f ${INPUTFILE}.zst
    let cl=$cl+1
    echo
  done
}

-- 
Kind regards,

/S
--
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue