On 10/10/2021 23:03, Junio C Hamano wrote:
Phillip Wood <phillip.wood123@xxxxxxxxx> writes:
On 08/10/2021 19:31, Junio C Hamano wrote:
Victoria Dye <vdye@xxxxxxxxxx> writes:
Phillip Wood wrote:
I was looking at the callers to prime_cache_tree() this morning
and would like to suggest an alternative approach - just delete
prime_cache_tree() and all of its callers!
Do you mean the calls added by new patches without understanding
what they are doing, or all calls to it?
I mean all calls to prime_cache_tree() after having understood (or at
least thinking that I understand) what they are doing.
Sorry, my statement was confusingly written. I meant "calls added
by new patches, written by those who do not understand what
prime_cache_tree() calls are doing", but after re-reading it, I
think it could be taken to be referring to "you may be commenting
without understanding what prime_cache_tree() calls are doing",
which wasn't my intention.
Thanks for clarifying that, I had misunderstood what you had written.
(a) a successful call to unpack_trees() updates the cache tree
(b) all the existing calls to prime_cache_tree() follow a successful
call to unpack_trees() and nothing touches in index in between the
call to unpack_trees() and prime_cache_tree().
Ahh, OK.
I think we originally avoided calling cache_tree_update() lightly
(because it is essentially a "write-tree", a fairly heavy-weight
operation, without I/O) and instead relied on prime_cache_tree() to
get degraded cache-tree back into freshness.
What I forgot was that 52fca218 (unpack-trees: populate cache-tree
on successful merge, 2015-07-28) added cache_tree_update() there at
the end of unpack_trees(). The commit covers quite a wide range of
operations---the log message says "merge", but in fact anything that
uses unpack_trees() including branch switching and the resetting of
the index are affected, and they cause a full reconstruction of the
cache tree by calling cache_tree_update().
For most callers of prime_cache_tree(), like the ones in "git
read-tree" and "git reset", it is immediately obvious that we just
read from the same tree, and we should have everything from the tree
and nothing else in the resulting index, so it is clear that the
prime_cache_tree() call is recreating the same cache-tree
information that we already should have computed ourselves, and
these calls can go (or if "prime" is still cheaper than "update",
these callers can pass an option to tell unpack_trees() to skip the
cache_tree_update() call, because they will call "prime" immediately
after).
I haven't really thought this through but could we teach unpack_trees()
to call prime_cache_tree() rather than cache_tree_update() when that
would be safe? For callers that use oneway_merge() merge it should
always be safe I think and it might be possible to modify twoway_merge()
to signal if the final tree in the index matches the second one passed
to it. We could have a more general mechanism for the callback to signal
if it is safe to prime the tree but I suspect the callers that are using
custom callbacks are not updating the whole tree.
Best Wishes
Phillip
For other callers it is not immediately obvious, but I trust you are
correctly reading the code ;-)
Thanks.