Re: commit-graph paranoia performance, was Re: [ANNOUNCE] Git v2.43.0-rc1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 13, 2023 at 03:55:38PM -0500, Jeff King wrote:
> On Thu, Nov 09, 2023 at 02:33:54AM +0900, Junio C Hamano wrote:
> 
> >  * The codepath to traverse the commit-graph learned to notice that a
> >    commit is missing (e.g., corrupt repository lost an object), even
> >    though it knows something about the commit (like its parents) from
> >    what is in commit-graph.
> >    (merge 7a5d604443 ps/do-not-trust-commit-graph-blindly-for-existence later to maint).
> 
> I happened to be timing "rev-list" for an unrelated topic today, and I
> noticed that this change had a rather large effect. The commit message
> for 7a5d604443 claims a 30% performance regression. But that's when
> using "--topo-order", and actually writing out the result.
> 
> Running "rev-list --count" on a copy of linux.git with a fully-built
> commit-graph shows that the run-time doubles:
> 
>   Benchmark 1: git.v2.42.1 rev-list --count HEAD
>     Time (mean ± σ):     658.0 ms ±   5.2 ms    [User: 613.5 ms, System: 44.4 ms]
>     Range (min … max):   650.2 ms … 666.0 ms    10 runs
>   
>   Benchmark 2: git.v2.43.0-rc1 rev-list --count HEAD
>     Time (mean ± σ):      1.333 s ±  0.019 s    [User: 1.263 s, System: 0.069 s]
>     Range (min … max):    1.302 s …  1.361 s    10 runs
>   
>   Summary
>     git.v2.42.1 rev-list --count HEAD ran
>       2.03 ± 0.03 times faster than git.v2.43.0-rc1 rev-list --count HEAD

Ah, indeed. I thought I already benchmarked the worst-case behaviour by
simply doing a full graph walk, but of course the performance hit is
even worse when not outputting the commits at all but only counting
them.

> Now in defense of that patch, this particular command is going to be one
> of the most sensitive in terms of percent change, simply because it
> isn't doing much besides walking the commits. And 650ms isn't _that_ big
> in an absolute sense. But it also doesn't quite feel like nothing, even
> tacked onto a command that might otherwise take 1000ms to run.
> 
> Should we default GIT_COMMIT_GRAPH_PARANOIA to "0"? Yes, some operations
> might miss a breakage, but that is true of so much of Git. For day to
> day commands we generally assume that the repository is not corrupted,
> and avoid looking at any data we can. Other commands (like "commit-graph
> verify", but maybe others) would probably want to be more careful
> (either by checking this case explicitly, or by enabling the paranoia
> flag themselves).
> 
> -Peff

I'd be fine with that as a follow-up change, yes. I agree that in
general we shouldn't see this kind of corruption, and it's good that the
behaviour can be toggled so easily now.

I'm happy to write that patch if you don't plan to.

Patrick

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux