On Mon, Jul 23, 2018 at 04:51:38PM -0400, Ben Peart wrote: > >>> What's the current state of the index before this checkout? > >> > >> This was after running "git checkout" multiple times so there was really > >> nothing for git to do. > > > > Hmm.. this means cache-tree is fully valid, unless you have changes in > > index. We're quite aggressive in repairing cache-tree since aecf567cbf > > (cache-tree: create/update cache-tree on checkout - 2014-07-05). If we > > have very good cache-tree records and still spend 33s on > > traverse_trees, maybe there's something else. > > > > I'm not at all familiar with the cache-tree and couldn't find any > documentation on it other than index-format.txt which says "it helps > speed up tree object generation for a new commit." I guess you have the starting points you need after Jeff's and Junio's explanation (and it would be great if cache-tree could actually be for for this two-way merge). But to make it easier for new people in future, maybe we should add this? This is basically a ripoff of Junio's explanation with starting points (write-tree and index-format.txt). I wanted to incorporate some pieces from Jeff's too but I think Junio's already covered it well. -- 8< -- Subject: [PATCH] cache-tree.h: more description of what it is and what's it used for Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx> --- cache-tree.h | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/cache-tree.h b/cache-tree.h index cfd5328cc9..d25a800a72 100644 --- a/cache-tree.h +++ b/cache-tree.h @@ -5,6 +5,35 @@ #include "tree.h" #include "tree-walk.h" +/* + * cache-tree is an index extension that records tree object names for + * subdirectories you see in the index. It is mainly used for + * generating trees from the index before you create a new commit (see + * builtin/write-tree.c as starting point) but it's also used in "git + * diff-index --cached $TREE" as an optimization. See index-format.txt + * for on-disk format. + * + * Every time you write the contents of the index as a tree object, we + * need to collect the object name for each top-level paths and write + * a new top-level tree object out, after doing the same recursively + * for any modified subdirectory. Whenever you add, remove or modify a + * path in the index, the cache-tree entry for enclosing directories + * are invalidated, so a cache-tree entry that is still valid means + * that all the paths in the index under that directory match the + * contents of the tree object that the cache-tree entry holds. + * + * And that property is used by "diff-index --cached $TREE" that is + * run internally. When we find that the subdirectory "D"'s + * cache-tree entry is valid in the index, and the tree object + * recorded in the cache-tree for that subdirectory matches the + * subtree D in the tree object $TREE, then "diff-index --cached" + * ignores the entire subdirectory D (which saves relatively little in + * the index as it only needs to scan what is already in the memory + * forward, but on the $TREE traversal side, it does not have to even + * open a subtree, that can save a lot), and with a well-populated + * cache-tree, it can save a significant processing. + */ + struct cache_tree; struct cache_tree_sub { struct cache_tree *cache_tree; -- 2.18.0.656.gda699b98b3 -- 8< --