Jeff King <peff@xxxxxxxx> writes: > This fixes a regression in 7c0fe330d5 (rev-list: handle missing tree > objects properly, 2018-10-05) where rev-list will now complain about the > empty tree when it doesn't physically exist on disk. > > Before that commit, we relied on the traversal code in list-objects.c to > walk through the trees. Since it uses parse_tree(), we'd do a normal > object lookup that includes looking in the set of "cached" objects > (which is where our magic internal empty-tree kicks in). > > After that commit, we instead tell list-objects.c not to die on any > missing trees, and we check them ourselves using has_object_file(). But > that function uses OBJECT_INFO_SKIP_CACHED, which means we won't use our > internal empty tree. Yikes. Thanks for spotting. > This patch makes the minimal fix, which is to swap out a direct call to > oid_object_info_extended(), minus the SKIP_CACHED flag, instead of > calling has_object_file(). This is all that has_object_file() is doing > under the hood. And there's little danger of unrelated fallout from > other unexpected "cached" objects, since there's only one call site that > ends such a cached object, and it's in git-blame. OK. That last one is not even "cached" but "merely exists in-core" virtual commit, that represents the locally modified state, right? I do not think rev-list ever asks if these object exist, but if they were asked, we should say they also exist. > Signed-off-by: Jeff King <peff@xxxxxxxx> > --- > I prepared this directly on top of 7c0fe330d5, but it should merge > cleanly into the current tip of master. > > I think we might also consider just having has_object_file() respect > cached objects. The SKIP_CACHED flag comes from Jonathan Tan's > e83e71c5e1 (sha1_file: refactor has_sha1_file_with_flags, 2017-06-21). > But it was just matching the old behavior; it's not clear to me that we > particularly care about that, and it wasn't simply that nobody bothered > to put the cached-object check into has_sha1_file(). Yup, I am fine with such a clean-up after we fix this regression. > Some concerns/arguments against it: > > - we probably would want to make sure we do not short-cut > write_sha1_file(). I.e., we should still write it to disk when > somebody wants it. But I think that works, because that function > uses its own check-and-freshen infrastructure. > > - some callers of has_sha1_file() might care about durability between > processes. Because it's baked in, the empty tree is safe for that > (whatever follow-on process runs, it will also be baked in there). > But that's not necessarily true for other "cached" objects. I'm not > really that worried about it because we use it sparingly (the only > call to pretend_sha1_file() is in git-blame, and if it ever did ask > "do we have this object", I actually think the right answer would be > "yes"). ... and I realize that I should have read ahead before writing the four lines above myself ;-) We are on the same page. > But if this is a concern, we could perhaps have two levels of flags: > SKIP_CACHED and SKIP_INTERNAL. > > builtin/rev-list.c | 2 +- > t/t1060-object-corruption.sh | 10 ++++++++++ > 2 files changed, 11 insertions(+), 1 deletion(-) > > diff --git a/builtin/rev-list.c b/builtin/rev-list.c > index 49d6deed70..877b6561f4 100644 > --- a/builtin/rev-list.c > +++ b/builtin/rev-list.c > @@ -237,7 +237,7 @@ static inline void finish_object__ma(struct object *obj) > static int finish_object(struct object *obj, const char *name, void *cb_data) > { > struct rev_list_info *info = cb_data; > - if (!has_object_file(&obj->oid)) { > + if (oid_object_info_extended(the_repository, &obj->oid, NULL, 0) < 0) { > finish_object__ma(obj); > return 1; > } > diff --git a/t/t1060-object-corruption.sh b/t/t1060-object-corruption.sh > index ac1f189fd2..807b63b473 100755 > --- a/t/t1060-object-corruption.sh > +++ b/t/t1060-object-corruption.sh > @@ -125,4 +125,14 @@ test_expect_success 'fetch into corrupted repo with index-pack' ' > ) > ' > > +test_expect_success 'internal tree objects are not "missing"' ' > + git init missing-empty && > + ( > + cd missing-empty && > + empty_tree=$(git hash-object -t tree /dev/null) && > + commit=$(echo foo | git commit-tree $empty_tree) && > + git rev-list --objects $commit > + ) > +' > + > test_done