Ben Peart <peartben@xxxxxxxxx> writes: >> Hmm.. this means cache-tree is fully valid, unless you have changes in >> index. We're quite aggressive in repairing cache-tree since aecf567cbf >> (cache-tree: create/update cache-tree on checkout - 2014-07-05). If we >> have very good cache-tree records and still spend 33s on >> traverse_trees, maybe there's something else. >> > > I'm not at all familiar with the cache-tree and couldn't find any > documentation on it other than index-format.txt which says "it helps > speed up tree object generation for a new commit." In this particular > case, no new commit is being created so I don't know that the > cache-tree would help. cache-tree is an index extension that records tree object names for subdirectories you see in the index. Every time you write the contents of the index as a tree object, we need to collect the object name for each top-level paths and write a new top-level tree object out, after doing the same recursively for any modified subdirectory. Whenever you add, remove or modify a path in the index, the cache-tree entry for enclosing directories are invalidated, so a cache-tree entry that is still valid means that all the paths in the index under that directory match the contents of the tree object that the cache-tree entry holds. And that property is used by "diff-index --cached $TREE" that is run internally. When we find that the subdirectory "D"'s cache-tree entry is valid in the index, and the tree object recorded in the cache-tree for that subdirectory matches the subtree D in the tree object $TREE, then "diff-index --cached" ignores the entire subdirectory D (which saves relatively little in the index as it only needs to scan what is already in the memory forward, but on the $TREE traversal side, it does not have to even open a subtree, that can save a lot), and with a well-populated cache-tree, it can save a significant processing. I think that is what Duy meant to refer to while looking at the numbers. > After a quick look at the code, the only place I can find that tries > to use cache_tree_matches_traversal() is in unpack_callback() and that > only happens if n == 1 and in the "git checkout" case, n == 2. Am I > missing something?