Re: git reset for index restoration?

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 22 May 2014 14:34:54 -0700

Jeff King <peff@xxxxxxxx> writes:

> [+cc Junio for cache-tree expertise]
> ...
> We never call reset_index now, because we handle it via diff.  We could
> call prime_cache_tree in this case, but I'm not sure if that is a good
> idea, because it primes it from scratch (and so it opens up all those
> trees that we are trying to avoid touching). I'm not sure if there's an
> easy way to update it incrementally; I don't know the cache-tree code
> very well.

The cache-tree is designed to start in a well-populated state,
allowing you to efficiently smudge the part you touched by
invalidating while keeping the parts you haven't touched intact.

What is missing in its API is a more fine-grained support to let us
say "it has degraded too much and we need to bring it into a
well-populated state again for it to be truly useful as an
optimization."  There are only two modes of support to revive a
degraded cache-tree, one being write_cache_as_tree(), in which case
we have to compute necessary tree object names anyway (so there is
no point discarding the result of the computation), and the other
being calls to prime-cache-tree, in which we happen to know that the
whole index contents must match the whole tree structure represented
by one tree object.

Both aim to restore the cache-tree into a fully-populated state, and
there is no support to populate it "well enough" by doing anything
incremental.  You can call write-tree side incremental, because it
does reuse what is still valid without recomputing tree objects for
them---but the result is a fully-populated state.

Adding a more fine-grain support is not against the overall design,
but it was unclear what such additional API functions should look
like, and where we can call them safely, at least back when we were
actively improving it.  Two that comes to my mind are:

 - We know that the subtrees down in this directory are degraded too
   much; write-tree only the subtrees that correspond to this
   directory without restoring other parts of the tree.

 - We just populated the index with the subtrees in this directory
   and know that they should match the tree hierarchy exactly.
   prime-cache-tree only the parts without affecting other parts of
   the tree.

As with calls to existing (whole-tree) prime-cache-tree, the latter
is an error-prone optimization---I think we had cases where we said
"after this operation, we know that the index must exactly match the
tree we used to muck with the index" and added a call, and later
discovered that "must exactly match" was not true.

The former forces recomputation, so there is much less safety
concern.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html