[PATCH v3 0/4] Speed up unpack_trees()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a minor update to address Ben's comments and add his
measurements in the commit message of 2/4 for the record.

I've also checked about the lookahead thing in unpack_trees() to see
if we accidentally break something there, which is my biggest worry.
See [1] and [2] for context, but I believe since we can't have D/F
conflicts, the situation where lookahead is needed will not occur. So
we should be safe.

[1] da165f470e (unpack-trees.c: prepare for looking ahead in the index - 2010-01-07)
[2] 730f72840c (unpack-trees.c: look ahead in the index - 2009-09-20)

range-diff:

1:  789f7e2872 ! 1:  05eb762d2d unpack-trees.c: add performance tracing
    @@ -1,6 +1,6 @@
     Author: Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx>
     
    -    unpack-trees.c: add performance tracing
    +    unpack-trees: add performance tracing
     
         We're going to optimize unpack_trees() a bit in the following
         patches. Let's add some tracing to measure how long it takes before
2:  589bed1366 ! 2:  02286ad123 unpack-trees: optimize walking same trees with cache-tree
    @@ -32,6 +32,24 @@
             0.111793866   0.032933140 s: diff-index
             0.587933288   0.398924370 s: git command: /home/pclouds/w/git/git
     
    +    Another measurement from Ben's running "git checkout" with over 500k
    +    trees (on the whole series):
    +
    +        baseline        new
    +      ----------------------------------------------------------------------
    +        0.535510167     0.556558733     s: read cache .git/index
    +        0.3057373       0.3147105       s: initialize name hash
    +        0.0184082       0.023558433     s: preload index
    +        0.086910967     0.089085967     s: refresh index
    +        7.889590767     2.191554433     s: unpack trees
    +        0.120760833     0.131941267     s: update worktree after a merge
    +        2.2583504       2.572663167     s: repair cache-tree
    +        0.8916137       0.959495233     s: write index, changed mask = 28
    +        3.405199233     0.2710663       s: unpack trees
    +        0.000999667     0.0021554       s: update worktree after a merge
    +        3.4063306       0.273318333     s: diff-index
    +        16.9524923      9.462943133     s: git command: git.exe checkout
    +
         This command calls unpack_trees() twice, the first time on 2way merge
         and the second 1way merge. In both times, "unpack trees" time is
         reduced to one third. Overall time reduction is not that impressive of
    @@ -39,7 +57,6 @@
         repair cache-tree line.
     
         Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx>
    -    Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx>
     
     diff --git a/unpack-trees.c b/unpack-trees.c
     --- a/unpack-trees.c
    @@ -170,7 +187,7 @@
      }
      
     +/*
    -+ * Note that traverse_by_cache_tree() duplicates some logic in this funciton
    ++ * Note that traverse_by_cache_tree() duplicates some logic in this function
     + * without actually calling it. If you change the logic here you may need to
     + * check and change there as well.
     + */
    @@ -189,12 +206,3 @@
      static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, struct name_entry *names, struct traverse_info *info)
      {
      	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
    -@@
    - 	uint64_t start = getnanotime();
    - 
    - 	if (len > MAX_UNPACK_TREES)
    --		die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
    -+		die(_("unpack_trees takes at most %d trees"), MAX_UNPACK_TREES);
    - 
    - 	memset(&el, 0, sizeof(el));
    - 	if (!core_apply_sparse_checkout || !o->update)
3:  7c6f863fc0 = 3:  c87b82ffee unpack-trees: reduce malloc in cache-tree walk
4:  6ca17b1138 ! 4:  e791cdfc82 unpack-trees: cheaper index update when walking by cache-tree
    @@ -40,7 +40,6 @@
         attempt should be on that "repair cache-tree" line.
     
         Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx>
    -    Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx>
     
     diff --git a/cache.h b/cache.h
     --- a/cache.h
    @@ -119,20 +118,6 @@
      	free(tree_ce);
      	if (o->debug_unpack)
      		printf("Unpacked %d entries from %s to %s using cache-tree\n",
    -@@
    - 		if (!ret) {
    - 			if (!o->result.cache_tree)
    - 				o->result.cache_tree = cache_tree();
    -+			/*
    -+			 * TODO: Walk o.src_index->cache_tree, quickly check
    -+			 * if o->result.cache has the exact same content for
    -+			 * any valid cache-tree in o.src_index, then we can
    -+			 * just copy the cache-tree over instead of hashing a
    -+			 * new tree object.
    -+			 */
    - 			if (!cache_tree_fully_valid(o->result.cache_tree))
    - 				cache_tree_update(&o->result,
    - 						  WRITE_TREE_SILENT |
     
     diff --git a/unpack-trees.h b/unpack-trees.h
     --- a/unpack-trees.h

Nguyễn Thái Ngọc Duy (4):
  unpack-trees: add performance tracing
  unpack-trees: optimize walking same trees with cache-tree
  unpack-trees: reduce malloc in cache-tree walk
  unpack-trees: cheaper index update when walking by cache-tree

 cache-tree.c   |   2 +
 cache.h        |   1 +
 read-cache.c   |   3 +-
 unpack-trees.c | 152 +++++++++++++++++++++++++++++++++++++++++++++++++
 unpack-trees.h |   1 +
 5 files changed, 158 insertions(+), 1 deletion(-)

-- 
2.18.0.656.gda699b98b3



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux