Re: [PATCH v3 0/5] Avoid spawning gzip in git archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi René,

On Tue, 14 Jun 2022, René Scharfe wrote:

> Am 14.06.22 um 13:28 schrieb Johannes Schindelin:
> >
> > By the way, the main reason why I did not work more is that in
> > http://madler.net/pipermail/zlib-devel_madler.net/2019-December/003308.html,
> > Mark Adler (the zlib maintainer) announced that...
> >
> >> [...] There are many well-tested performance improvements in zlib
> >> waiting in the wings that will be incorporated over the next several
> >> months. [...]
> >
> > This was in December 2019. And now it's June 2022 and I kind of wonder
> > whether those promised improvements will still come.
> >
> > In the meantime, however, a viable alternative seems to have cropped up:
> > https://github.com/zlib-ng/zlib-ng. Essentially, it looks as if it is what
> > zlib should have become after above-quoted announcement.
> >
> > In particular the CPU intrinsics support (think MMX, SSE2/3, etc) seem to
> > be very interesting and I would not be completely surprised if building
> > Git with your patches and linking against zlib-ng would paint a very
> > favorable picture not only in terms of CPU time but also in terms of
> > wallclock time. Sadly, I have not been able to set aside time to look into
> > that angle, but maybe I can peak your interest?
> I was unable to preload zlib-ng using DYLD_INSERT_LIBRARIES on macOS
> 12.4 so far.  The included demo proggy looks impressive, though:
>
> $ hyperfine -w3 -L gzip gzip,../zlib-ng/minigzip "git -C ../linux archive --format=tar HEAD | {gzip} -c"
> Benchmark #1: git -C ../linux archive --format=tar HEAD | gzip -c
>   Time (mean ± σ):     20.424 s ±  0.006 s    [User: 23.964 s, System: 0.432 s]
>   Range (min … max):   20.414 s … 20.434 s    10 runs
>
> Benchmark #2: git -C ../linux archive --format=tar HEAD | ../zlib-ng/minigzip -c
>   Time (mean ± σ):     12.158 s ±  0.006 s    [User: 13.908 s, System: 0.376 s]
>   Range (min … max):   12.145 s … 12.166 s    10 runs
>
> Summary
>   'git -C ../linux archive --format=tar HEAD | ../zlib-ng/minigzip -c' ran
>     1.68 ± 0.00 times faster than 'git -C ../linux archive --format=tar HEAD | gzip -c'

Intriguing.

I finally managed to play around with building and packaging zlib-ng [*1*]
(since I want to use it as a drop-in replacement for zlib, I think it is
best to configure it with `--zlib-compat`, that way I do not have to
fiddle with any equivalent of `LD_PRELOAD`). Here are my numbers:

	zlib-ng: 14.409 s ± 0.209 s
	zlib:    26.843 s ± 0.636 s

These are pretty good, which made me think that they might actually even
help regular Git operations (because we zlib every loose object).

So I tried to `fast-import` some 2500 commits from linux.git into a fresh
repository, and the zlib-ng version takes ~51s and the zlib version takes
~58s. At first I thought that it might be noise, but the trend seems to be
steady. It's not a huge improvement, of course, but I think that might be
because most of the time is spent parsing.

I then tried to test the performance focusing on writing loose object, by
using p0008 (increasing the number of files from 50 to 1500 and
restricting it to fsyncMethod=none).

Unfortunately, the numbers are not really conclusive. I do see minor
speed-ups with zlib-ng, mostly, in the single digit percentages, though
occasionally in the other direction. In other words, there is no clear-cut
change, just a vague tendency. My guess: Git writes too small files (their
contents are of the form "$basedir$test_tick.$counter") and zlib-ng's
superior performance does not come to bear.

Still, for larger workloads, zlib-ng seems to offer a quite nice and
substantial performance improvement over zlib.

Ciao,
Dscho

Footnote *1*: https://github.com/msys2/MINGW-packages/compare/master...dscho:zlib-ng

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux