Re: [PATCH] commit: detect commits that exist in commit-graph but not in the ODB

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 17, 2023 at 11:34:53AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@xxxxxx> writes:
> 
> > Fair point indeed. The following is a worst-case scenario benchmark of
> > of the change where we do a full topological walk of all reachable
> > commits in the graph, executed in linux.git. We parse commit parents via
> > `repo_parse_commit_gently()`, so the new code path now basically has to
> > check for object existence of every reachable commit:
> > ...
> > The added check does lead to a performance regression indeed, which is
> > not all that unexpected. That being said, the commit-graph still results
> > in a significant speedup compared to the case where we don't have it.
> 
> Yeah, I agree that both points are expected.  An extra check that is
> much cheaper than the full parsing is paying a small price to be a
> bit more careful than before.  The question is if the price is small
> enough.  I am still not sure if the extra carefulness is warranted
> for all normal cases to spend 30% extra cycles.
> 
> Thanks.

Well, seen in contexts like the thread that spawned this discussion I
think it's preferable to take a relatively smaller price compared to
disabling the commit graph entirely in some situations. With that in
mind, personally I'd be happy with either of two outcomes here:

    - We take the patch I proposed as a hardening measure, which allows
      us to use the commit graph in basically any situation where we
      could parse objects from the ODB without any downside except for a
      performance hit.

    - We say that corrupt repositories do not need to be accounted for
      when using metadata caches like the commit-graph. That would mean
      in consequence that for the `--missing` flag we would not have to
      disable commit graphs.

The test failure that was exposed in Karthik's test only stems from the
fact that we manually corrupt the repository and is not specific to the
`--missing` flag. This is further stressed by the new test aded in my
own patch, as we can trigger a similar bug when not using `--missing`.

For end users, object pruning would happen either via git-gc(1) or via
git-maintenance(1), and both of them do know to update commit graphs
already. This should significantly decrease the chance that such stale
commit graphs would be found out there in the wild, but it's still not
impossible of course. Especially in cases where you use lower-level
commands like git-repack(1) with cruft packs or git-prune(1), which is
often the case for Git hosters, the risk is higher though.

Patrick

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux