Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 7/24/2018 11:13 AM, Duy Nguyen wrote:
On Mon, Jul 23, 2018 at 04:51:38PM -0400, Ben Peart wrote:
What's the current state of the index before this checkout?

This was after running "git checkout" multiple times so there was really
nothing for git to do.

Hmm.. this means cache-tree is fully valid, unless you have changes in
index. We're quite aggressive in repairing cache-tree since aecf567cbf
(cache-tree: create/update cache-tree on checkout - 2014-07-05). If we
have very good cache-tree records and still spend 33s on
traverse_trees, maybe there's something else.


I'm not at all familiar with the cache-tree and couldn't find any
documentation on it other than index-format.txt which says "it helps
speed up tree object generation for a new commit."

I guess you have the starting points you need after Jeff's and Junio's
explanation (and it would be great if cache-tree could actually be for
for this two-way merge). But to make it easier for new people in
future, maybe we should add this?

This is basically a ripoff of Junio's explanation with starting points
(write-tree and index-format.txt). I wanted to incorporate some pieces
from Jeff's too but I think Junio's already covered it well.


I definitely like capturing this in the code or documentation somewhere. Given I checked the header file for any hints on the design, I think that is a reasonable place to put it.

-- 8< --
Subject: [PATCH] cache-tree.h: more description of what it is and what's it used for

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx>
---
  cache-tree.h | 29 +++++++++++++++++++++++++++++
  1 file changed, 29 insertions(+)

diff --git a/cache-tree.h b/cache-tree.h
index cfd5328cc9..d25a800a72 100644
--- a/cache-tree.h
+++ b/cache-tree.h
@@ -5,6 +5,35 @@
  #include "tree.h"
  #include "tree-walk.h"
+/*
+ * cache-tree is an index extension that records tree object names for
+ * subdirectories you see in the index. It is mainly used for
+ * generating trees from the index before you create a new commit (see
+ * builtin/write-tree.c as starting point) but it's also used in "git
+ * diff-index --cached $TREE" as an optimization. See index-format.txt
+ * for on-disk format.
+ *
+ * Every time you write the contents of the index as a tree object, we

I had to read this a couple of times to figure out what was meant by "write the contents of the index as a tree object." Maybe it was just me but how about something like:

"Every time you write a new tree object from the index you need to collect the object name for each top-level path and write a new top-level tree object out and then do the same recursively for any subdirectory."

+ * need to collect the object name for each top-level paths and write
+ * a new top-level tree object out, after doing the same recursively
+ * for any modified subdirectory. Whenever you add, remove or modify a
+ * path in the index, the cache-tree entry for enclosing directories
+ * are invalidated, so a cache-tree entry that is still valid means
+ * that all the paths in the index under that directory match the
+ * contents of the tree object that the cache-tree entry holds.
+ *
+ * And that property is used by "diff-index --cached $TREE" that is
+ * run internally.  When we find that the subdirectory "D"'s
+ * cache-tree entry is valid in the index, and the tree object
+ * recorded in the cache-tree for that subdirectory matches the
+ * subtree D in the tree object $TREE, then "diff-index --cached"
+ * ignores the entire subdirectory D (which saves relatively little in
+ * the index as it only needs to scan what is already in the memory
+ * forward, but on the $TREE traversal side, it does not have to even
+ * open a subtree, that can save a lot), and with a well-populated
+ * cache-tree, it can save a significant processing.
+ */
+
  struct cache_tree;
  struct cache_tree_sub {
  	struct cache_tree *cache_tree;




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux