On Thu, Aug 16, 2018 at 11:08 PM Jeff King <peff@xxxxxxxx> wrote: > > On Thu, Aug 16, 2018 at 04:55:56PM -0400, Jeff King wrote: > > > > * We spend the majority of the ~30s on this: > > > https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-check.c#L70-L79 > > > > This is hashing the actual packfile. This is potentially quite long, > > especially if you have a ton of big objects. > > > > I wonder if we need to do this as a separate step anyway, though. Our > > verification is based on index-pack these days, which means it's going > > to walk over the whole content as part of the "Indexing objects" step to > > expand base objects and mark deltas for later. Could we feed this hash > > as part of that walk over the data? It's not going to save us 30s, but > > it's likely to be more efficient. And it would fold the effort naturally > > into the existing progress meter. > > Actually, I take it back. That's the nice, modern way we do it in > git-verify-pack. But git-fsck uses the ancient "just walk over all of > the idx entries method". It at least sorts in pack order, which is good, > but: > > - it's not multi-threaded, like index-pack/verify-pack > > - the index-pack way is actually more efficient than pack-ordering for > the delta-base cache, because it actually walks the delta-graph in > the optimal order > I actually tried to make git-fsck use index-pack --verify at one point. The only thing that stopped it from working was index-pack automatically wrote the newer index version if I remember correctly, and that would fail the final hash check. fsck performance was not a big deal so I dropped it. Just saying it should be possible, if someone's interested in that direction. -- Duy