Am 06.04.2017 um 22:37 schrieb git@xxxxxxxxxxxxxxxxx:
From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>
Teach traverse_trees_recursive() to not do redundant ODB
lookups when both directories refer to the same OID.
In operations such as read-tree, checkout, and merge when
the differences between the commits are relatively small,
there will likely be many directories that have the same
SHA-1. In these cases we can avoid hitting the ODB multiple
times for the same SHA-1.
TODO This change is a first attempt to test that by comparing
TODO the hashes of name[i] and name[i-i] and simply copying
TODO the tree-descriptor data. I was thinking of the n=2
TODO case here. We may want to extend this to the n=3 case.
================
On the Windows repo (500K trees, 3.1M files, 450MB index),
this reduced the overall time by 0.75 seconds when cycling
between 2 commits with a single file difference.
(avg) before: 22.699
(avg) after: 21.955
===============
================
Using the p0004-read-tree test (posted earlier this week)
with 1M files on Linux:
before:
$ ./p0004-read-tree.sh
0004.5: switch work1 work2 (1003037) 11.99(8.12+3.32)
0004.6: switch commit aliases (1003037) 11.95(8.20+3.14)
after:
$ ./p0004-read-tree.sh
0004.5: switch work1 work2 (1003037) 11.17(7.84+2.76)
0004.6: switch commit aliases (1003037) 11.13(7.82+2.72)
================
Signed-off-by: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>
---
tree-walk.c | 8 ++++++++
tree-walk.h | 1 +
unpack-trees.c | 13 +++++++++----
3 files changed, 18 insertions(+), 4 deletions(-)
diff --git a/tree-walk.c b/tree-walk.c
index ff77605..3b82f0e 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -92,6 +92,14 @@ void *fill_tree_descriptor(struct tree_desc *desc, const unsigned char *sha1)
return buf;
}
+void *copy_tree_descriptor(struct tree_desc *dest, const struct tree_desc *src)
+{
+ void *buf = xmalloc(src->size);
+ memcpy(buf, src->buffer, src->size);
+ init_tree_desc(dest, buf, src->size);
+ return buf;
+}
+
static void entry_clear(struct name_entry *a)
{
memset(a, 0, sizeof(*a));
diff --git a/tree-walk.h b/tree-walk.h
index 68bb78b..ca4032b 100644
--- a/tree-walk.h
+++ b/tree-walk.h
@@ -43,6 +43,7 @@ int tree_entry(struct tree_desc *, struct name_entry *);
int tree_entry_gently(struct tree_desc *, struct name_entry *);
void *fill_tree_descriptor(struct tree_desc *desc, const unsigned char *sha1);
+void *copy_tree_descriptor(struct tree_desc *dest, const struct tree_desc *src);
struct traverse_info;
typedef int (*traverse_callback_t)(int n, unsigned long mask, unsigned long dirmask, struct name_entry *entry, struct traverse_info *);
diff --git a/unpack-trees.c b/unpack-trees.c
index 3a8ee19..50aacad 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -554,10 +554,15 @@ static int traverse_trees_recursive(int n, unsigned long dirmask,
newinfo.df_conflicts |= df_conflicts;
for (i = 0; i < n; i++, dirmask >>= 1) {
- const unsigned char *sha1 = NULL;
- if (dirmask & 1)
- sha1 = names[i].oid->hash;
- buf[i] = fill_tree_descriptor(t+i, sha1);
+ if (i > 0 && (dirmask & 1) && names[i].oid && names[i-1].oid &&
Can .oid even be NULL? (I didn't check, but it's dereferenced in the
sha1 assignment below, so I guess the answer is no and these two checks
are not needed.)
+ !hashcmp(names[i].oid->hash, names[i-1].oid->hash)) {
Calling oidcmp would be shorter.
+ buf[i] = copy_tree_descriptor(&t[i], &t[i-1]);
buf keeps track of the allocations that need to be freed at the end of
the function. I assume these buffers are read-only. Can you use an
alias here instead of a duplicate by calling init_tree_desc with the
predecessor's buffer and setting buf[i] to NULL? Or even just copying
t[i - 1] to t[i] with an assignment? That would be shorter and probably
also quicker.
+ } else {
+ const unsigned char *sha1 = NULL;
+ if (dirmask & 1)
+ sha1 = names[i].oid->hash;
+ buf[i] = fill_tree_descriptor(t+i, sha1);
+ }
}
bottom = switch_cache_bottom(&newinfo);