Re: non-smooth progress indication for git fsck and git gc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 16, 2018 at 04:55:56PM -0400, Jeff King wrote:

> >  * We spend the majority of the ~30s on this:
> >    https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-check.c#L70-L79
> 
> This is hashing the actual packfile. This is potentially quite long,
> especially if you have a ton of big objects.
> 
> I wonder if we need to do this as a separate step anyway, though. Our
> verification is based on index-pack these days, which means it's going
> to walk over the whole content as part of the "Indexing objects" step to
> expand base objects and mark deltas for later. Could we feed this hash
> as part of that walk over the data? It's not going to save us 30s, but
> it's likely to be more efficient. And it would fold the effort naturally
> into the existing progress meter.

Actually, I take it back. That's the nice, modern way we do it in
git-verify-pack. But git-fsck uses the ancient "just walk over all of
the idx entries method". It at least sorts in pack order, which is good,
but:

  - it's not multi-threaded, like index-pack/verify-pack

  - the index-pack way is actually more efficient than pack-ordering for
    the delta-base cache, because it actually walks the delta-graph in
    the optimal order

Once upon a time verify-pack used this same pack-check code, and we
switched in 3de89c9d42 (verify-pack: use index-pack --verify,
2011-06-03). So I suspect the right thing to do here is for fsck to
switch to that, too, and then delete pack-check.c entirely.

That may well involve us switching the progress to a per-pack view
(e.g., "checking pack 1/10 (10%)", followed by its own progress meter).
But I don't think that would be a bad thing. It's a more realistic view
of the work we're actually doing. Although perhaps it's misleading about
the total work remaining, because not all packs are the same size (so
you see that you're halfway through pack 2/10, but that's quite likely
to be half of the total work if it's the one gigantic pack).

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux