On Sun, Apr 14, 2019 at 12:01:10AM +0200, René Scharfe wrote: > >> As we already link to the zlib library, we can perform the compression > >> without even requiring gzip on the host machine. > > > > Very cool. It's nice to drop a dependency, and this should be a bit more > > efficient, too. > > Getting rid of dependencies is good, and using zlib is the obvious way to > generate .tgz files. Last time I tried something like that, a separate gzip > process was faster, though -- at least on Linux [1]. How does this one > fare? I'd expect a separate gzip to be faster in wall-clock time for a multi-core machine, but overall consume more CPU. I'm slightly surprised that your timings show that it actually wins on total CPU, too. Here are best-of-five times for "git archive --format=tar.gz HEAD" on linux.git (the machine is a quad-core): [before, separate gzip] real 0m21.501s user 0m26.148s sys 0m0.619s [after, internal gzwrite] real 0m25.156s user 0m25.059s sys 0m0.096s which does show what I expect (longer overall, but less total CPU). Which one you prefer depends on your situation, of course. A user on a workstation with multiple cores probably cares most about end-to-end latency and using all of their available horsepower. A server hosting repositories and receiving many unrelated requests probably cares more about total CPU (though the differences there are small enough that it may not even be worth having a config knob to un-parallelize it). > Doing compression in its own thread may be a good idea. Yeah. It might even make the patch simpler, since I'd expect it to be implemented with start_async() and a descriptor, making it look just like a gzip pipe to the caller. :) -Peff