Re: [PATCH v2] unpack-trees: avoid duplicate ODB lookups during checkout

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 07.04.2017 um 17:53 schrieb git@xxxxxxxxxxxxxxxxx:
From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>

Teach traverse_trees_recursive() to not do redundant ODB
lookups when both directories refer to the same OID.

In operations such as read-tree, checkout, and merge when
the differences between the commits are relatively small,
there will likely be many directories that have the same
SHA-1.  In these cases we can avoid hitting the ODB multiple
times for the same SHA-1.

This patch handles n=2 and n=3 cases and simply copies the
data rather than repeating the fill_tree_descriptor().

================
On the Windows repo (500K trees, 3.1M files, 450MB index),
this reduced the overall time by 0.75 seconds when cycling
between 2 commits with a single file difference.

(avg) before: 22.699
(avg) after:  21.955
===============

================
Using the p0004-read-tree test (posted earlier this week)
with 1M files on Linux:

before:
$ ./p0004-read-tree.sh
0004.5: switch work1 work2 (1003037)       11.99(8.12+3.32)
0004.6: switch commit aliases (1003037)    11.95(8.20+3.14)

after:
$ ./p0004-read-tree.sh
0004.5: switch work1 work2 (1003037)       11.02(7.71+2.78)
0004.6: switch commit aliases (1003037)    10.95(7.57+2.82)
================

Signed-off-by: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>
---
  unpack-trees.c | 23 +++++++++++++++++++----
  1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 3a8ee19..143c5d9 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -531,6 +531,11 @@ static int switch_cache_bottom(struct traverse_info *info)
  	return ret;
  }
+static inline int are_same_oid(struct name_entry *name_j, struct name_entry *name_k)
+{
+	return name_j->oid && name_k->oid && !oidcmp(name_j->oid, name_k->oid);
+}
+
  static int traverse_trees_recursive(int n, unsigned long dirmask,
  				    unsigned long df_conflicts,
  				    struct name_entry *names,
@@ -554,10 +559,20 @@ static int traverse_trees_recursive(int n, unsigned long dirmask,
  	newinfo.df_conflicts |= df_conflicts;
for (i = 0; i < n; i++, dirmask >>= 1) {
-		const unsigned char *sha1 = NULL;
-		if (dirmask & 1)
-			sha1 = names[i].oid->hash;
-		buf[i] = fill_tree_descriptor(t+i, sha1);
+		if (i > 0 && are_same_oid(&names[i], &names[i-1])) {
+			/* implicitly borrow buf[i-1] inside tree_desc[i] */
+			memcpy(&t[i], &t[i-1], sizeof(struct tree_desc));

An assignment would be simpler:

			t[i] = t[i - 1];

+			buf[i] = NULL;
+		} else if (i > 1 && are_same_oid(&names[i], &names[i-2])) {
+			/* implicitly borrow buf[i-2] inside tree_desc[i] */
+			memcpy(&t[i], &t[i-2], sizeof(struct tree_desc));

Similar case.

+			buf[i] = NULL;

What makes the previous two entries special, or differently put: Why not just check *all* previous entries? MAX_UNPACK_TREES is 8; the number of comparisons would just about double in the worst case and stay the same for three trees or less. The order of four trees or more wouldn't matter anymore.

+		} else {
+			const unsigned char *sha1 = NULL;
+			if (dirmask & 1)
+				sha1 = names[i].oid->hash;
+			buf[i] = fill_tree_descriptor(t+i, sha1);
+		}
  	}
bottom = switch_cache_bottom(&newinfo);




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]