[PATCH 0/5] avoiding pointless pack-directory re-scans

Jeff King <peff@xxxxxxxx> · Mon, 20 Nov 2017 15:26:07 -0500

I recently dug into a performance problem running "git fetch" in a
repository with 5000 packs. Now obviously that's a silly number of packs
to have, but I did find some pretty low-hanging fruit.  Most of the time
was spent in pointlessly re-scanning the objects/pack directory.

This series has two fixes, along with a perf test that covers this case.
I think the perf test is especially important here because we actually
fixed one of these cases (patch 4) already, but later regressed it
without noticing. The perf suite could have caught that (and would after
this series).

There are numbers in the individual commits, but here's the before/after
for the whole series:

Test            origin            HEAD
--------------------------------------------------------
5551.4: fetch   5.48(4.99+0.50)   0.14(0.09+0.05) -97.4%

That's on a somewhat-contrived setup meant to maximize us noticing a
regression. But I do think these re-scans are probably kicking in for
normal situations, but just aren't quite expensive enough for anybody to
have noticed and dug into it.

Patches:

  [1/5]: p5550: factor our nonsense-pack creation
  [2/5]: t/perf/lib-pack: use fast-import checkpoint to create packs
  [3/5]: p5551: add a script to test fetch pack-dir rescans
  [4/5]: everything_local: use "quick" object existence check
  [5/5]: sha1_file: don't re-scan pack directory for null sha1

 fetch-pack.c                 |  3 ++-
 sha1_file.c                  |  3 +++
 t/perf/lib-pack.sh           | 25 ++++++++++++++++++++
 t/perf/p5550-fetch-tags.sh   | 25 ++------------------
 t/perf/p5551-fetch-rescan.sh | 55 ++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 87 insertions(+), 24 deletions(-)
 create mode 100644 t/perf/lib-pack.sh
 create mode 100755 t/perf/p5551-fetch-rescan.sh

-Peff