Re: [PATCH v3 4/4] revision: avoid hitting packfiles when commits are in commit-graph

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Wed, 04 Aug 2021 12:51:18 +0200

On Tue, Aug 03 2021, Patrick Steinhardt wrote:

> I wonder what our stance on this is. I can definitely understand the
> angle that this would be a deal breaker given that we now claim commits
> exist which don't anymore. On the other hand, we update commit-graphs
> via git-gc(1), which makes this scenario a lot less likely nowadays. Is
> there any precedent in our codebase where we treat commits part of the
> commit-graph as existing? If not, do we want to make that assumption?

I don't think there is, but don't see why given the performance benefits
it should not at least be exposed in this form for those that think they
know what they're doing.

But right now the way we write the commit graph seems guaranteed to
produce races in this area, i.e. we expire objects first, and then we
write a new graph (see cmd_gc()). I.e. something like:

 1. (Re)pack reachable objects
 2. Prune unrechable and old
 3. Write a new commit graph using discovered tips

I don't see a good reason other than just that this needs some
refactoring to not instead do:

 1. Discover reachable refs
 2. (Re)pack reachable objects from those refs
 3. Write a new commit graph using those ref tips, i.e. don't re-run
    for_each_ref() in write_commit_graph_reachable() but call
    write_commit_graph() with those OIDs directly.
 4. Prune unreachable and old

Which I think would nicely get around this particular race condition,
which also doesn't exist if you call "git commit-graph write", it's just
if we write a new graph and then expire objects without being aware that
the graph needs updating.