On 08/10/2021 19:31, Junio C Hamano wrote:
Victoria Dye <vdye@xxxxxxxxxx> writes:
Phillip Wood wrote:
I was looking at the callers to prime_cache_tree() this morning
and would like to suggest an alternative approach - just delete
prime_cache_tree() and all of its callers!
Do you mean the calls added by new patches without understanding
what they are doing, or all calls to it?
I mean all calls to prime_cache_tree() after having understood (or at
least thinking that I understand) what they are doing. As I tried to
explain in the part of my message that you have cut
(a) a successful call to unpack_trees() updates the cache tree
(b) all the existing calls to prime_cache_tree() follow a successful
call to unpack_trees() and nothing touches in index in between the call
to unpack_trees() and prime_cache_tree().
Maybe I've misunderstood something but that leads me believe those calls
can be removed without degrading performance.
Best Wishes
Phillip
Every time you update a path in the index from the working tree
(e.g. "git add") and other sources, the directory in the cache-tree
that includes the path is invalidated, and the surviving subtrees of
cache-tree is used to speed up writing the index as a tree object,
doing "diff-index --cached" (hence "git status"), etc. So over
time, the cache-tree "degrades" as you muck with the index entries.
When you write out the index as a tree, we by definition have to
know the object names of all the tree objects that correspond to
each directory in the index. A fully valid cache-tree is saved when
it happens, so the above process can start over.
There are cases other than "git write-tree" that we can cheaply
learn the object names of all the tree objects that correspond to
each directory in the index. When we read the index from an
existing tree object, we know which tree (and its subtrees) we
populated the index from, so we can salvage a degraded cache-tree.
"reset --hard" and "reset --mixed" may be good opportunities, so is
"checkout <branch>" that starts from a clean index. And cache tree
priming is a mechanism to take advantage of such an opportunity.
The cache-tree does not have to be primed and all you lose is
performance, so priming can be removed mostly "without an issue", if
you are not paying attention to cache-tree degradation. Priming
with incorrect data, however, would leave permanent damage by
writing a wrong tree via "git write-tree" (hence "git commit") and
showing a wrong diff via "git diff-index [--cached]" (hence "git
status" and probably "git add -- <pathspec>"), so not priming is
safer than priming incorrectly.
HTH.