On 08/10/2021 18:14, Victoria Dye wrote:
Phillip Wood wrote:
Hi Victoria
On 07/10/2021 22:15, Victoria Dye via GitGitGadget wrote:
From: Victoria Dye <vdye@xxxxxxxxxx>
Remove `ensure_full_index` guard on `prime_cache_tree` and update
`prime_cache_tree_rec` to correctly reconstruct sparse directory entries in
the cache tree. While processing a tree's entries, `prime_cache_tree_rec`
must determine whether a directory entry is sparse or not by searching for
it in the index (*without* expanding the index). If a matching sparse
directory index entry is found, no subtrees are added to the cache tree
entry and the entry count is set to 1 (representing the sparse directory
itself). Otherwise, the tree is assumed to not be sparse and its subtrees
are recursively added to the cache tree.
I was looking at the callers to prime_cache_tree() this morning and would like to suggest an alternative approach - just delete prime_cache_tree() and all of its callers! As far as I can see it is only ever called after a successful call to unpack_trees() and since 52fca2184d ("unpack-trees: populate cache-tree on successful merge", 2015-07-28) unpack_trees() updates the cache tree for the caller. All the call sites are pretty obvious apart from the one in t/help/test-fast-rebase.c where unpack_trees() is called by merge_switch_to_result()
It looks like `prime_cache_tree` can be removed mostly without issue, but
it causes the two last tests in `t4058-diff-duplicates.sh` to fail. Those
tests document failure cases when dealing with duplicate tree entries [1],
and it looks like `prime_cache_tree` was creating the appearance of a
fully-reset index but was still leaving it in a state where subsequent
operations could fail.
I'm inclined to say the solution here would be to update the tests to
document the "new" failure behavior and proceed with removing
`prime_cache_tree`, because:
* the test using `git reset --hard` disables `GIT_TEST_CHECK_CACHE_TREE`,
indicating that `prime_cache_tree` already wasn't behaving correctly
* attempting to fix the overarching issues with duplicate tree entries will
substantially delay this patch series
* a duplicate entry fix is largely unrelated to the intended scope of the
series
That sounds like a good way forward
Best Wishes
Phillip
Another option would be to leave `prime_cache_tree` as it is, but with it
being apparently useless outside of mostly-broken use cases in `t4058`, it
seems like a waste to keep it around.
[1] ac14de13b2 (t4058: explore duplicate tree entry handling in a bit more detail, 2020-12-11)