Re: [PATCH 1/1] commit-graph.c: avoid unnecessary tag dereference when merging

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 24, 2020 at 05:08:26PM -0600, Taylor Blau wrote:

> > The trouble is that I'm not sure what _should_ happen. Aborting the
> > whole commit-graph generation seems overboard (and would be annoying for
> > cases where whole swaths of history became unreachable and went away;
> > presumably we'd be dropping _all_ of those objects, and the write phase
> > would be just fine). I have a feeling the correct solution is to do this
> > merging pass earlier, before we check close_reachable(). Or if not a
> > true merging pass, at least a pass to check which existing entries are
> > still valid. But I don't know how painful that reordering would be.
> 
> Maybe, but I'm not sure that moving 'close_reachable' up would
> necessarily solve all of our problems. Or, at least, that it wouldn't
> create a set of new problems :).
> 
> Let's say that you have (1) some connected component of your history
> that is written to a commit-graph, where (2) part of that cluster has
> been dropped (e.g., due to corruption, a rogue 'git gc', etc), and (3)
> you're writing a new graph with '--input=graphed'.
> 
> What is 'close_reachable' supposed to do? If some of the now-missing
> commits are in the reachability closure of the commits that still exist
> in the repository, then we're back to where we started. We *could* have
> 'close_reachable' take all missing commits and drop their ancestors and
> descendants, but this creates quite the headache for 'close_reachable',
> which now needs to keep track of such things.

Yeah, I think my suggestion was dumb. I was thinking that
close_reachable() might somehow fix things up, but all it ever does with
the current code is _add_ commits to the graph. And we would want the
opposite: removing ones which can no longer be present, because their
ancestors are gone. That would be possible, but would be quite a pain
(you'd have to walk an inversion of the parent DAG).

> I'm hopeful that this isn't so common in practice, but I'm also not
> entirely sure one way or another. I can plan to deploy this patch to
> GitHub's servers for a ~month and see if we experience it. If it turns
> out to be common, I'll assume that others have this problem, too, in
> which case we can go back and think more about it.

It _shouldn't_ be too common, because it implies that we have commit X
but not its ancestor X^. And the pruning and repacking code tries to
avoid throwing away X^ in such a case. It could still happen due to a
race, though.

So yeah, I think we should mostly ignore it. It's not a new problem with
your series (and the only reason it is more common with your series is
that the old code was actually incorrectly handling some cases). It
might be worth turning the "missing parent" BUG() into a die(), since we
know it's reachable. But that's largely academic, since in either case
commit-graph is going to barf and fail to write.

As a follow-up to this part for anyone else following the discussion:

> I can plan to deploy this patch to GitHub's servers for a ~month and
> see if we experience it.

...I don't think we'll actually generate good data here. We're probably
going to end up doing our "big maintenance" commit-graph roll-ups by
just feeding --reachable as input, and dropping all of the existing
graphs.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux