Re: [PATCH 2/2] repack -ad: prune the list of shallow commits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 20, 2018 at 02:30:23AM -0700, Junio C Hamano wrote:

> > The entries in shallow file says that history behind them may not
> > exist in the repository due to its shallowness but history after
> > them are supposed to be traversable (otherwise we have a repository
> > corruption).  It is true that an entry that itself no longer exists
> > in this repository should not be in shallow file, as the presence of
> > that entry breaks that promise the file is making---that commit
> > ought to exist and it is safe to traverse down to it, so keeping the
> > entry in the file is absolutely a wrong thing to do.
> >
> > But that does not automatically mean that just simply removing it
> > makes the resulting repository good, does it?  Wouldn't the solution
> > for that corruption be to set a new entry to stop history traversal
> > before reaching that (now-missing) commit?
> 
> The above is overly pessimistic and worried about an impossible
> situation, I would think.  The reason why a commit that used to be
> in the shallow file is being pruned during a "repack" is because it
> has become unreachable.  By definition, no future history traversal
> that wants to enumerate reachable commits needs to be stopped from
> finding that commits that are older than this commit being pruned
> are missing by having this in the shallow list.  If there is a ref
> or a reflog entry from which such a problematic traversal starts at,
> we wouldn't be pruing this commit in the first place, because the
> commit has not become unreachable yet.
> 
> So a repository does not become corrupt by pruning the commit *and*
> removing it from the shallow file at the same time.

Right. I think a lot of this is rethinking how shallow pruning works,
too, which is not something Dscho is trying to change. The simplest
argument (which I think Dscho has made elsewhere, too) is: this is
necessary in the current shallow code when dropping objects. We do it
therefore from prune, but miss the case when git-repack is run itself
outside of git-gc.

I do still think the gc/prune architecture is a bit muddled, but at this
point in the discussion I feel OK saying that people running "git repack
-ad" would not be upset to have their shallows pruned.

But the patch is still not OK as-is because prune_shallow() requires the
SEEN flag on each reachable object struct, which we have not set in the
repack process (hence the failing test I posted earlier).  So we need a
solution for that, which may impact ideas about how the call works.
E.g., some possible solutions are:

 - teach pack-objects to optionally trigger the shallow prune based on
   its internal walk

 - have repack use the just-completed pack as a hint about reachability

 - introduce a mechanism to trigger the shallow prune based on a
   commit-only reachability check, and run that from repack (or from gc
   and document that it must be run if you are using repack as a manual
   gc replacement)

I'm not advocating any particular solution there, but just showing that
there's an array of them (and probably more that I didn't mention).

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux