On Fri, Jul 20, 2018 at 02:30:23AM -0700, Junio C Hamano wrote: > > The entries in shallow file says that history behind them may not > > exist in the repository due to its shallowness but history after > > them are supposed to be traversable (otherwise we have a repository > > corruption). It is true that an entry that itself no longer exists > > in this repository should not be in shallow file, as the presence of > > that entry breaks that promise the file is making---that commit > > ought to exist and it is safe to traverse down to it, so keeping the > > entry in the file is absolutely a wrong thing to do. > > > > But that does not automatically mean that just simply removing it > > makes the resulting repository good, does it? Wouldn't the solution > > for that corruption be to set a new entry to stop history traversal > > before reaching that (now-missing) commit? > > The above is overly pessimistic and worried about an impossible > situation, I would think. The reason why a commit that used to be > in the shallow file is being pruned during a "repack" is because it > has become unreachable. By definition, no future history traversal > that wants to enumerate reachable commits needs to be stopped from > finding that commits that are older than this commit being pruned > are missing by having this in the shallow list. If there is a ref > or a reflog entry from which such a problematic traversal starts at, > we wouldn't be pruing this commit in the first place, because the > commit has not become unreachable yet. > > So a repository does not become corrupt by pruning the commit *and* > removing it from the shallow file at the same time. Right. I think a lot of this is rethinking how shallow pruning works, too, which is not something Dscho is trying to change. The simplest argument (which I think Dscho has made elsewhere, too) is: this is necessary in the current shallow code when dropping objects. We do it therefore from prune, but miss the case when git-repack is run itself outside of git-gc. I do still think the gc/prune architecture is a bit muddled, but at this point in the discussion I feel OK saying that people running "git repack -ad" would not be upset to have their shallows pruned. But the patch is still not OK as-is because prune_shallow() requires the SEEN flag on each reachable object struct, which we have not set in the repack process (hence the failing test I posted earlier). So we need a solution for that, which may impact ideas about how the call works. E.g., some possible solutions are: - teach pack-objects to optionally trigger the shallow prune based on its internal walk - have repack use the just-completed pack as a hint about reachability - introduce a mechanism to trigger the shallow prune based on a commit-only reachability check, and run that from repack (or from gc and document that it must be run if you are using repack as a manual gc replacement) I'm not advocating any particular solution there, but just showing that there's an array of them (and probably more that I didn't mention). -Peff