Re: [PATCH v3 0/5] Avoid spawning gzip in git archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Me again,

On Thu, 30 Jun 2022, Johannes Schindelin wrote:

> I finally managed to play around with building and packaging zlib-ng
> [*1*] (since I want to use it as a drop-in replacement for zlib, I think
> it is best to configure it with `--zlib-compat`, that way I do not have
> to fiddle with any equivalent of `LD_PRELOAD`). Here are my numbers:
>
> 	zlib-ng: 14.409 s ± 0.209 s
> 	zlib:    26.843 s ± 0.636 s
>
> These are pretty good, which made me think that they might actually even
> help regular Git operations (because we zlib every loose object).
>
> So I tried to `fast-import` some 2500 commits from linux.git into a fresh
> repository, and the zlib-ng version takes ~51s and the zlib version takes
> ~58s. At first I thought that it might be noise, but the trend seems to be
> steady. It's not a huge improvement, of course, but I think that might be
> because most of the time is spent parsing.
>
> I then tried to test the performance focusing on writing loose object, by
> using p0008 (increasing the number of files from 50 to 1500 and
> restricting it to fsyncMethod=none).
>
> Unfortunately, the numbers are not really conclusive. I do see minor
> speed-ups with zlib-ng, mostly, in the single digit percentages, though
> occasionally in the other direction. In other words, there is no clear-cut
> change, just a vague tendency. My guess: Git writes too small files (their
> contents are of the form "$basedir$test_tick.$counter") and zlib-ng's
> superior performance does not come to bear.
>
> Still, for larger workloads, zlib-ng seems to offer a quite nice and
> substantial performance improvement over zlib.

Stolee pointed out to me that objects inside pack files are also
zlib-compressed, and that measuring the speed of `git rev-list --objects
--all --count` might therefore be a better test.

And this is where things get a little messy: in the context of Git for
Windows, my local measurements indicate that zlib is better, with ~41
seconds using zlib vs ~52 seconds using zlib-ng (but the latter has a
rather large variance).

These measurements were done with a relatively straight-forward build of
zlib-ng v2.0.6, and on a hunch I then tried to build the tip of zlib-ng's
`develop` branch (which was much less straight-forward) and now get
virtually the same speed with that `rev-list` command.

But then I repeated the `archive` measurement with the `develop` version
of zlib-ng, and while it was still substantially faster than zlib, it was
slightly slower than zlib-ng v2.0.6 (zlib: ~26 seconds, zlib-ng v2.0.6:
~14 seconds, zlib-ng develop: ~16 seconds). Still, much, much faster than
using `-c tar.tgz.command="gzip -cn"` at ~24 seconds.

So: the picture is messy. The latest official release of zlib-ng seems to
offer performance wins using `archive` but slight losses using `rev-list.
Upgrading to the latest revision of zlib-ng offers slightly smaller
performance wins using `archive` and equivalent performance using
`rev-list`. Both blow `gzip -cn` out of the water, thanks to using MMX or
whatever my laptop's CPU offers.

The take-away as far as Git for Windows is concerned: It seems not _quite_
the time yet to switch from zlib to zlib-ng, I want to wait until there is
an official zlib-ng release with favorable speed.

Ciao,
Dscho

P.S.: I pushed a WIP update to this branch:

> Footnote *1*: https://github.com/msys2/MINGW-packages/compare/master...dscho:zlib-ng

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux