From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx> Teach traverse_trees_recursive() to not do redundant ODB lookups when both directories refer to the same OID. In operations such as read-tree, checkout, and merge when the differences between the commits are relatively small, there will likely be many directories that have the same SHA-1. In these cases we can avoid hitting the ODB multiple times for the same SHA-1. TODO This change is a first attempt to test that by comparing TODO the hashes of name[i] and name[i-i] and simply copying TODO the tree-descriptor data. I was thinking of the n=2 TODO case here. We may want to extend this to the n=3 case. ================ On the Windows repo (500K trees, 3.1M files, 450MB index), this reduced the overall time by 0.75 seconds when cycling between 2 commits with a single file difference. (avg) before: 22.699 (avg) after: 21.955 =============== ================ Using the p0004-read-tree test (posted earlier this week) with 1M files on Linux: before: $ ./p0004-read-tree.sh 0004.5: switch work1 work2 (1003037) 11.99(8.12+3.32) 0004.6: switch commit aliases (1003037) 11.95(8.20+3.14) after: $ ./p0004-read-tree.sh 0004.5: switch work1 work2 (1003037) 11.17(7.84+2.76) 0004.6: switch commit aliases (1003037) 11.13(7.82+2.72) ================ Signed-off-by: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx> --- tree-walk.c | 8 ++++++++ tree-walk.h | 1 + unpack-trees.c | 13 +++++++++---- 3 files changed, 18 insertions(+), 4 deletions(-) diff --git a/tree-walk.c b/tree-walk.c index ff77605..3b82f0e 100644 --- a/tree-walk.c +++ b/tree-walk.c @@ -92,6 +92,14 @@ void *fill_tree_descriptor(struct tree_desc *desc, const unsigned char *sha1) return buf; } +void *copy_tree_descriptor(struct tree_desc *dest, const struct tree_desc *src) +{ + void *buf = xmalloc(src->size); + memcpy(buf, src->buffer, src->size); + init_tree_desc(dest, buf, src->size); + return buf; +} + static void entry_clear(struct name_entry *a) { memset(a, 0, sizeof(*a)); diff --git a/tree-walk.h b/tree-walk.h index 68bb78b..ca4032b 100644 --- a/tree-walk.h +++ b/tree-walk.h @@ -43,6 +43,7 @@ int tree_entry(struct tree_desc *, struct name_entry *); int tree_entry_gently(struct tree_desc *, struct name_entry *); void *fill_tree_descriptor(struct tree_desc *desc, const unsigned char *sha1); +void *copy_tree_descriptor(struct tree_desc *dest, const struct tree_desc *src); struct traverse_info; typedef int (*traverse_callback_t)(int n, unsigned long mask, unsigned long dirmask, struct name_entry *entry, struct traverse_info *); diff --git a/unpack-trees.c b/unpack-trees.c index 3a8ee19..50aacad 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -554,10 +554,15 @@ static int traverse_trees_recursive(int n, unsigned long dirmask, newinfo.df_conflicts |= df_conflicts; for (i = 0; i < n; i++, dirmask >>= 1) { - const unsigned char *sha1 = NULL; - if (dirmask & 1) - sha1 = names[i].oid->hash; - buf[i] = fill_tree_descriptor(t+i, sha1); + if (i > 0 && (dirmask & 1) && names[i].oid && names[i-1].oid && + !hashcmp(names[i].oid->hash, names[i-1].oid->hash)) { + buf[i] = copy_tree_descriptor(&t[i], &t[i-1]); + } else { + const unsigned char *sha1 = NULL; + if (dirmask & 1) + sha1 = names[i].oid->hash; + buf[i] = fill_tree_descriptor(t+i, sha1); + } } bottom = switch_cache_bottom(&newinfo); -- 2.9.3