On Sun, Jun 02, 2019 at 06:58:48PM +0200, René Scharfe wrote: > > That sounds about right. It's basically every version of every tree that > > has a symlink. Did it make a noticeable difference in timing? Indexing > > the whole kernel history is already a horribly slow process. :) > > Right, I didn't notice a difference -- no patience for watching that > thing to the end. But here are some numbers for v2.21.0 vs. master with > the patch: > > Benchmark #1: git fsck > Time (mean ± σ): 307.775 s ± 9.054 s [User: 307.173 s, System: 0.448 s] > Range (min … max): 294.052 s … 322.931 s 10 runs > > Benchmark #2: ~/src/git/git fsck > Time (mean ± σ): 319.754 s ± 2.255 s [User: 318.927 s, System: 0.671 s] > Range (min … max): 316.376 s … 323.747 s 10 runs > > Summary > 'git fsck' ran > 1.04 ± 0.03 times faster than '~/src/git/git fsck' I guess that's about what I'd expect. The bulk of the time in most repos will go to fscking the actual blobs, I'd think. But hitting each tree twice really is noticeable. > Seeing only a single CPU core being stressed for that long is a bit sad > to see. Checking individual objects should be relatively easy to > parallelize, shouldn't it? Yes. The fsck code is pretty old, and uses a very simple way of walking over all of the packs. index-pack (which backs verify-pack these days) is much smarter, and runs in parallel. It still takes a lock when doing the actual fsck checks, but most of the time goes to the zlib inflation and delta reconstruction. There's some discussion in: https://public-inbox.org/git/20180816210657.GA9291@xxxxxxxxxxxxxxxxxxxxx/ and even some patches elsewhere in the thread here: https://public-inbox.org/git/20180902075528.GC18787@xxxxxxxxxxxxxxxxxxxxx/ and here: https://public-inbox.org/git/20180902085503.GA25391@xxxxxxxxxxxxxxxxxxxxx/ I think the big show-stopper there is how ugly it is to run the pack verification in a separate process (and I suspect it is not just ugly from a code point of view, but actively breaks index-pack because it then relies on the set of objects seen during the first phase to do its connectivity check). So there would probably need to be some lib-ification work on index-pack first, so that we could call it (at least in verification mode) multiple times from inside fsck. -Peff