On Mon, Jun 10, 2019 at 12:44:54PM +0200, René Scharfe wrote: > Am 01.05.19 um 20:18 schrieb Jeff King: > > On Wed, May 01, 2019 at 07:45:05PM +0200, René Scharfe wrote: > > > >>> But since the performance is still not quite on par with `gzip`, I would > >>> actually rather not, and really, just punt on that one, stating that > >>> people interested in higher performance should use `pigz`. > >> > >> Here are my performance numbers for generating .tar.gz files again: > > OK, tried one more version, with pthreads (patch at the end). Also > redid all measurements for better comparability; everything is faster > now for some reason (perhaps due to a compiler update? clang version > 7.0.1-8 now): Hmm. Interesting that using pthreads is still slower than just shelling out to gzip: > master, using gzip(1): > Benchmark #1: git archive --format=tgz HEAD > Time (mean ± σ): 15.697 s ± 0.246 s [User: 19.213 s, System: 0.386 s] > Range (min … max): 15.405 s … 16.103 s 10 runs > [...] > using zlib in a separate thread (that's the new one): > Benchmark #1: git archive --format=tgz HEAD > Time (mean ± σ): 16.310 s ± 0.237 s [User: 20.075 s, System: 0.173 s] > Range (min … max): 15.983 s … 16.790 s 10 runs I wonder if zlib is just slower. Or if the cost of context switching is somehow higher than just dumping big chunks over a pipe. In particular, our gzip-alike is still faster than pthreads: > using a gzip-lookalike: > Benchmark #1: git archive --format=tgz HEAD > Time (mean ± σ): 16.289 s ± 0.218 s [User: 19.485 s, System: 0.337 s] > Range (min … max): 16.020 s … 16.555 s 10 runs though it looks like the timings do overlap. > > At GitHub we certainly do cache the git-archive output. We'd also be > > just fine with the sequential solution. We generally turn down > > pack.threads to 1, and keep our CPUs busy by serving multiple users > > anyway. > > > > So whatever has the lowest overall CPU time is generally preferable, but > > the times are close enough that I don't think we'd care much either way > > (and it's probably not worth having a config option or similar). > > Moving back to 2009 and reducing the number of utilized cores both feels > weird, but the sequential solution *is* the most obvious, easiest and > (by a narrow margin) lightest one if gzip(1) is not an option anymore. It sounds like we resolved to give the "internal gzip" its own name (whether it's a gzip-alike command, or a special name we recognize to trigger the internal code). So maybe we could continue to default to "gzip -cn", but platforms could do otherwise when shipping gzip there is a pain (i.e. Windows, but maybe also anybody else who wants to set NO_EXTERNAL_GZIP or detect it from autoconf). -Peff