On Thu, Nov 23, 2017 at 11:35:21AM +0900, Junio C Hamano wrote: > Not limiting us to the caller in the "fetch" codepath, I think the > expectation by callers of lookup_commit_reference_gently() in the > ideal world would be: > > - It has an object name, and wants to use it as point in the commit > DAG to define the traversal over the DAG, if it refers to a > commit known to us. > > - It does not know if these object names represent a tag object, a > commit object, or some other object. It does not know if the > local repository actually has them (e.g. we received a "have" > from the other side---missing is expected). > > - Hence, it would happily accept a NULL as "we do not have it" and > "we do have it, but it is not a commit-ish". > > And from that point of view, 2, 3a (missing), and 4 (0{40}) to yield > NULL is perfectly fine. 3b (exists but broken) may be a noteworthy > event, but for the purpose of the caller, it may want to proceed as > if the object is missing from our end, so it might deserve warning() > but not die(), at least as the default behaviour. Hmm, yeah, I did not differentiate 3a and 3b in my analysis. How we'd want to handle "missing" would vary from caller to caller, I think. Certainly for this case in "fetch" where it's the "old" value for a ref, it would be a corruption not to have it. Just grepping around a few of the other callers, I see: - archive.c:parse_treeish_arg: fine not to have it (we'd complain soon after that it doesn't point to a tree either). But also fine to complain hard. - blame.c:dwim_reverse_initial, and checkout_paths and switch_branches in checkout.c: missing object would mean a broken HEAD - show-branch.c:append_ref: missing would mean a broken ref - clear_commit_marks_for_object_array: conceptually OK to have a missing object, though I suspect practically impossible since we examined and marked the objects earlier - ref-filter's --merged, --contains, etc: the low-level code quietly ignores a missing object or non-commit, but the command-line parser enforces that we find a commit. So probably impossible to trigger, but arguably the second call should be a BUG(). So I dunno. That is far from exhaustive, but it does seem like most cases should assume the presence of the file. As for your proposed behavior: > And from that point of view, 2, 3a (missing), and 4 (0{40}) to yield > NULL is perfectly fine. 3b (exists but broken) may be a noteworthy > event, but for the purpose of the caller, it may want to proceed as > if the object is missing from our end, so it might deserve warning() > but not die(), at least as the default behaviour. That's actually not far from what happens now, with the only difference being that we _do_ actually die() on a corruption (really any error except ENOENT). I forgot that when I wrote my earlier message. You can see it by updating the "fetch" reproduction I sent earlier to corrupt: -- >8 -- git init parent git -C parent commit --allow-empty -m base git clone parent child git -C parent commit --allow-empty -m more cd child for i in .git/objects/??/* do chmod +w $i echo corrupt >$i done git fetch -- 8< -- which gives something like: error: inflate: data stream error (incorrect header check) error: unable to unpack 55c66a401fd4190382f9cd8b70c11f9f5adb044e header fatal: loose object 55c66a401fd4190382f9cd8b70c11f9f5adb044e (stored in .git/objects/55/c66a401fd4190382f9cd8b70c11f9f5adb044e) is corrupt So that's good. I do still think that many of the callers of lookup_commit_reference_gently() probably ought to die() in the "missing" case rather than continue (and produce subtly wrong answers). But it may not be that big a deal. For the most part, all bets are off in a corrupt repo. My main concern is just that we do not want the corruption to spread or to make it harder for us to recover from (and that includes allowing us to delete or overwrite other data that would otherwise be forbidden, which is what's happening in the fetch case). Most of the callers probably don't fall into that situation, but it might be nice to err on the side of caution. -Peff