On Tue, Mar 24, 2020 at 02:11:59AM -0400, Jeff King wrote: > On Sun, Mar 22, 2020 at 09:47:49AM -0600, Taylor Blau wrote: > > > > > [1] I'm actually not quite sure about correctness here. It should be > > > > fine to generate a graph file without any given commit; readers will > > > > just have to load that commit the old-fashioned way. But at this > > > > phase of "commit-graph write", I think we'll already have done the > > > > close_reachable() check. What does it mean to throw away a commit at > > > > this stage? If we're the parent of another commit, then it will have > > > > trouble referring to us by a uint32_t. Will the actual writing phase > > > > barf, or will we generate an invalid graph file? > > > > > > It doesn't seem great. If I instrument Git like this to simulate an > > > object temporarily "missing" (if it were really missing the whole repo > > > would be corrupt; we're trying to see what would happen if a race causes > > > us to momentarily not see it): > > > > This is definitely a problem on either side of this patch, which is > > demonstrated by the fact that you applied your changes without my patch > > on top (and that my patch isn't changing anything substantial in this > > area like removing the 'continue' statement). > > > > Should we address this before moving on with my patch? I think that we > > *could*, but I'd rather go forward with what we have for now, since it's > > only improving the situation, and not introducing a new bug. > > I do agree it's a problem before your patch. But I think your patch may > make it a lot more common, if only because it means we'd _actually_ be > dropping entries for objects that went away, instead of accidentally > keeping them due to re-using the graph result. So it probably is worth > trying to deal with it now, or at least thinking hard about it. > > The trouble is that I'm not sure what _should_ happen. Aborting the > whole commit-graph generation seems overboard (and would be annoying for > cases where whole swaths of history became unreachable and went away; > presumably we'd be dropping _all_ of those objects, and the write phase > would be just fine). I have a feeling the correct solution is to do this > merging pass earlier, before we check close_reachable(). Or if not a > true merging pass, at least a pass to check which existing entries are > still valid. But I don't know how painful that reordering would be. Maybe, but I'm not sure that moving 'close_reachable' up would necessarily solve all of our problems. Or, at least, that it wouldn't create a set of new problems :). Let's say that you have (1) some connected component of your history that is written to a commit-graph, where (2) part of that cluster has been dropped (e.g., due to corruption, a rogue 'git gc', etc), and (3) you're writing a new graph with '--input=graphed'. What is 'close_reachable' supposed to do? If some of the now-missing commits are in the reachability closure of the commits that still exist in the repository, then we're back to where we started. We *could* have 'close_reachable' take all missing commits and drop their ancestors and descendants, but this creates quite the headache for 'close_reachable', which now needs to keep track of such things. I'm hopeful that this isn't so common in practice, but I'm also not entirely sure one way or another. I can plan to deploy this patch to GitHub's servers for a ~month and see if we experience it. If it turns out to be common, I'll assume that others have this problem, too, in which case we can go back and think more about it. But, I'm hopeful that this problem is rare enough that it isn't worth worrying about for anybody, GitHub or not. > -Peff Thanks, Taylor