Re: [PATCH 1/1] commit-graph.c: avoid unnecessary tag dereference when merging

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 24, 2020 at 02:11:59AM -0400, Jeff King wrote:
> On Sun, Mar 22, 2020 at 09:47:49AM -0600, Taylor Blau wrote:
>
> > > > [1] I'm actually not quite sure about correctness here. It should be
> > > >     fine to generate a graph file without any given commit; readers will
> > > >     just have to load that commit the old-fashioned way. But at this
> > > >     phase of "commit-graph write", I think we'll already have done the
> > > >     close_reachable() check. What does it mean to throw away a commit at
> > > >     this stage? If we're the parent of another commit, then it will have
> > > >     trouble referring to us by a uint32_t. Will the actual writing phase
> > > >     barf, or will we generate an invalid graph file?
> > >
> > > It doesn't seem great. If I instrument Git like this to simulate an
> > > object temporarily "missing" (if it were really missing the whole repo
> > > would be corrupt; we're trying to see what would happen if a race causes
> > > us to momentarily not see it):
> >
> > This is definitely a problem on either side of this patch, which is
> > demonstrated by the fact that you applied your changes without my patch
> > on top (and that my patch isn't changing anything substantial in this
> > area like removing the 'continue' statement).
> >
> > Should we address this before moving on with my patch? I think that we
> > *could*, but I'd rather go forward with what we have for now, since it's
> > only improving the situation, and not introducing a new bug.
>
> I do agree it's a problem before your patch. But I think your patch may
> make it a lot more common, if only because it means we'd _actually_ be
> dropping entries for objects that went away, instead of accidentally
> keeping them due to re-using the graph result. So it probably is worth
> trying to deal with it now, or at least thinking hard about it.
>
> The trouble is that I'm not sure what _should_ happen. Aborting the
> whole commit-graph generation seems overboard (and would be annoying for
> cases where whole swaths of history became unreachable and went away;
> presumably we'd be dropping _all_ of those objects, and the write phase
> would be just fine). I have a feeling the correct solution is to do this
> merging pass earlier, before we check close_reachable(). Or if not a
> true merging pass, at least a pass to check which existing entries are
> still valid. But I don't know how painful that reordering would be.

Maybe, but I'm not sure that moving 'close_reachable' up would
necessarily solve all of our problems. Or, at least, that it wouldn't
create a set of new problems :).

Let's say that you have (1) some connected component of your history
that is written to a commit-graph, where (2) part of that cluster has
been dropped (e.g., due to corruption, a rogue 'git gc', etc), and (3)
you're writing a new graph with '--input=graphed'.

What is 'close_reachable' supposed to do? If some of the now-missing
commits are in the reachability closure of the commits that still exist
in the repository, then we're back to where we started. We *could* have
'close_reachable' take all missing commits and drop their ancestors and
descendants, but this creates quite the headache for 'close_reachable',
which now needs to keep track of such things.

I'm hopeful that this isn't so common in practice, but I'm also not
entirely sure one way or another. I can plan to deploy this patch to
GitHub's servers for a ~month and see if we experience it. If it turns
out to be common, I'll assume that others have this problem, too, in
which case we can go back and think more about it.

But, I'm hopeful that this problem is rare enough that it isn't worth
worrying about for anybody, GitHub or not.

> -Peff

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux