Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ben Peart <peartben@xxxxxxxxx> writes:

>> Hmm.. this means cache-tree is fully valid, unless you have changes in
>> index. We're quite aggressive in repairing cache-tree since aecf567cbf
>> (cache-tree: create/update cache-tree on checkout - 2014-07-05). If we
>> have very good cache-tree records and still spend 33s on
>> traverse_trees, maybe there's something else.
>>
>
> I'm not at all familiar with the cache-tree and couldn't find any
> documentation on it other than index-format.txt which says "it helps
> speed up tree object generation for a new commit."  In this particular
> case, no new commit is being created so I don't know that the
> cache-tree would help.

cache-tree is an index extension that records tree object names for
subdirectories you see in the index.  Every time you write the
contents of the index as a tree object, we need to collect the
object name for each top-level paths and write a new top-level tree
object out, after doing the same recursively for any modified
subdirectory.  Whenever you add, remove or modify a path in the
index, the cache-tree entry for enclosing directories are
invalidated, so a cache-tree entry that is still valid means that
all the paths in the index under that directory match the contents
of the tree object that the cache-tree entry holds.

And that property is used by "diff-index --cached $TREE" that is run
internally.  When we find that the subdirectory "D"'s cache-tree
entry is valid in the index, and the tree object recorded in the
cache-tree for that subdirectory matches the subtree D in the tree
object $TREE, then "diff-index --cached" ignores the entire
subdirectory D (which saves relatively little in the index as it
only needs to scan what is already in the memory forward, but on the
$TREE traversal side, it does not have to even open a subtree, that
can save a lot), and with a well-populated cache-tree, it can save a
significant processing.

I think that is what Duy meant to refer to while looking at the
numbers.

> After a quick look at the code, the only place I can find that tries
> to use cache_tree_matches_traversal() is in unpack_callback() and that
> only happens if n == 1 and in the "git checkout" case, n == 2. Am I
> missing something?



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux