Hi Jonathan, On Thu, 22 Apr 2021, Jonathan Tan wrote: > In update_one() (used only by cache_tree_update()), there is an object > existence check that, if it fails, will automatically trigger a lazy > fetch in a partial clone. But the fetch is not necessary - the object is > not actually being used. I find it curious, though, that the `ce_missing_ok` variable is defined thusly (sadly, the context of your diff is too small to show it): ce_missing_ok = mode == S_IFGITLINK || missing_ok || (has_promisor_remote() && ce_skip_worktree(ce)); Which means that the `has_object_file()` function is only called if the entry is not marked with the `skip-worktree` bit, i.e. if it is _not_ excluded from the sparse checkout. Wouldn't that mean that the object _should_ be there? I guess what I am saying is that while the commit message focuses on the "What?" of the patch, I would love to hear more about the "Why?". And maybe the "When?" as in: when does this actually matter? And since the bug was critical enough for you to spend time on crafting it, maybe it would make sense to add a regression test to ensure that this bug does not creep in again? > > Replace that check with two checks: an object existence check that does > not fetch, and then a check that that object is a promisor object. This essentially repeats what the diff says, but it might make more sense to explain why the post-image of this diff is more correct (and maybe discuss performance implications). > > Doing this avoids multiple lazy fetches when merging two trees in a > partial clone, as noticed at $DAYJOB. Ah. But where are those trees fetched, then? Maybe lead with the description of the bug? > Signed-off-by: Jonathan Tan <jonathantanmy@xxxxxxxxxx> > --- > Another alternative is to think about whether the object existence check > here is needed in the first place. > > There might also be other places we can make a similar change in > update_one(), but I limited myself to what's needed to solve the > specific case we discovered at $DAYJOB. I only see another `has_object_file()` call site at the very beginning, and I think this needs to fetch. Or maybe it is more efficient to construct the cache tree from scratch than fetch it? There is also `cache_tree_fully_valid_1()`, where I think the same handling could potentially make sense. (Or, if you target `seen`, `cache_tree_fully_valid()`. Ciao, Johannes > --- > cache-tree.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/cache-tree.c b/cache-tree.c > index add1f07713..6728722597 100644 > --- a/cache-tree.c > +++ b/cache-tree.c > @@ -6,6 +6,7 @@ > #include "object-store.h" > #include "replace-object.h" > #include "promisor-remote.h" > +#include "packfile.h" > > #ifndef DEBUG_CACHE_TREE > #define DEBUG_CACHE_TREE 0 > @@ -362,7 +363,9 @@ static int update_one(struct cache_tree *it, > (has_promisor_remote() && > ce_skip_worktree(ce)); > if (is_null_oid(oid) || > - (!ce_missing_ok && !has_object_file(oid))) { > + (!ce_missing_ok && > + !has_object_file_with_flags(oid, OBJECT_INFO_SKIP_FETCH_OBJECT) && > + !is_promisor_object(oid))) { > strbuf_release(&buffer); > if (expected_missing) > return -1; > -- > 2.31.1.498.g6c1eba8ee3d-goog > >