Taylor Blau <me@xxxxxxxxxxxx> writes: > > However, the action of first looking up the commit graph file is not > > done everywhere in Git, especially if the type of the object at the time > > of lookup is not known. This means that in a repo corruption situation, > > a user may encounter an "object missing" error, attempt to fetch it, and > > still encounter the same error later when they reattempt their original > > action, because the object is present in the commit graph file but not in > > the object DB. > > I think the type of repository corruption here may be underspecified. Hmm...if you have any specific points you'd like me to elaborate on (or better yet, wording suggestions), please let me know. > You say that we have some object, say X, whose type is not known. So we > don't load the commit-graph, realize that X is missing, and then try and > fetch it. Yes. > In this scenario, is X actually in the commit-graph, but not > in the object database? Yes. > Further, if X is in the commit-graph, I assume > we do not look it up there because we first try and find its type, which > fails, so we assume we don't have it (despite it appearing corruptly in > the commit-graph)? > > I think that matches the behavior you're describing, but I want to make > sure that I'm not thinking of something else. Strictly speaking, we are not trying to find its type. We are trying to find the object itself. (One could argue that if we find out that an object is a commit, we can then ignore the packfile and go look up the commit graph file. I'm not so sure this is a good idea, but this is moot, I think - as far as I know, we currently don't do this.) But yes, if the object is not in the object DB, we assume we don't have it. > You discuss this a little bit in your commit message, but I wonder if we > should just die() here. I feel like we're trying to work around a > situation where the commit-graph is obviously broken because it refers > to commit objects that don't actually exist in the object store. Yeah, that seems to be the consensus. I've switched it to a fatal error. > A few thoughts in this area: > > - What situation provokes this to be true? I could imagine there is > some bug that we don't fully have a grasp of. But I wonder if it is > even easier to provoke than that, say by pruning some objects out of > the object store, then not rewriting the commit-graph, leaving some > of the references dangling. The fetching of promisor objects that are descendants of non-promisor objects. [1] I think that the rewriting of the commit graph happens on every repack, thus avoiding the situation you describe (unless there is a bug there). [1] https://lore.kernel.org/git/20241001191811.1934900-1-calvinwan@xxxxxxxxxx/ > - Does 'git fsck' catch this case within the commit-graph? Honestly, I haven't checked - I've been concentrating on fixing the fetch part for now (and also the bug that caused the missing commits [2]). [2] https://lore.kernel.org/git/cover.1729792911.git.jonathantanmy@xxxxxxxxxx/ > - Are the other areas of the code that rely on the assumption that all > entries in the commit-graph actually exist on disk? If so, are they > similarly broken? Yes, the fetch negotiation code. It is not "broken" in that it solely uses repo_parse_commit() which always checks the commit graph, so as long as the commit graph has everything we need, there will be no error. There might be other systems that rely both on the commit graph and the object DB, and thus have an inconsistent view (so, "similarly broken" as you describe it) but at least in the partial clone case, the severity of the issue is not as high as in "fetch", because these other systems can lazily fetch the missing commit and then proceed. > Another thought about this whole thing is that we essentially have a > code path that says: "I found this object from the commit-graph, but > don't know if I actually have it on disk, so mark it to be checked later > via has_object()". > > I wonder if it would be more straightforward to replace the call to > lookup_commit_in_graph() with a direct call to has_object() in the > deref_without_lazy_fetch() function, which I think would both (a) > eliminate the need for a new flag bit to be allocated, and (b) prevent > looking up the object twice. > > Thoughts? > > Thanks, > Taylor This would undo the optimization in 62b5a35a33 (fetch-pack: optimize loading of refs via commit graph, 2021-09-01), and also would not work without changes to the fetch negotiation code - I tried to describe it in the commit message, perhaps not very clearly, but the issue is that even if we emit "want X", the fetch negotiation code would emit "have X" (the X is the same in both), and at least for our JGit server at $DAYJOB, the combination of "want X" and "have X" results in the server sending an empty packfile (reasonable behavior, I think). (And I don't think the changes to the fetch negotiation code are worth it.)