From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx> Teach traverse_trees_recursive() to not do redundant ODB lookups when both directories refer to the same OID. In operations such as read-tree, checkout, and merge when the differences between the commits are relatively small, there will likely be many directories that have the same SHA-1. In these cases we can avoid hitting the ODB multiple times for the same SHA-1. This patch handles n=2 and n=3 cases and simply copies the data rather than repeating the fill_tree_descriptor(). ================ On the Windows repo (500K trees, 3.1M files, 450MB index), this reduced the overall time by 0.75 seconds when cycling between 2 commits with a single file difference. (avg) before: 22.699 (avg) after: 21.955 =============== ================ Using the p0004-read-tree test (posted earlier this week) with 1M files on Linux: before: $ ./p0004-read-tree.sh 0004.5: switch work1 work2 (1003037) 11.99(8.12+3.32) 0004.6: switch commit aliases (1003037) 11.95(8.20+3.14) after: $ ./p0004-read-tree.sh 0004.5: switch work1 work2 (1003037) 11.02(7.71+2.78) 0004.6: switch commit aliases (1003037) 10.95(7.57+2.82) ================ Signed-off-by: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx> --- unpack-trees.c | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/unpack-trees.c b/unpack-trees.c index 3a8ee19..143c5d9 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -531,6 +531,11 @@ static int switch_cache_bottom(struct traverse_info *info) return ret; } +static inline int are_same_oid(struct name_entry *name_j, struct name_entry *name_k) +{ + return name_j->oid && name_k->oid && !oidcmp(name_j->oid, name_k->oid); +} + static int traverse_trees_recursive(int n, unsigned long dirmask, unsigned long df_conflicts, struct name_entry *names, @@ -554,10 +559,20 @@ static int traverse_trees_recursive(int n, unsigned long dirmask, newinfo.df_conflicts |= df_conflicts; for (i = 0; i < n; i++, dirmask >>= 1) { - const unsigned char *sha1 = NULL; - if (dirmask & 1) - sha1 = names[i].oid->hash; - buf[i] = fill_tree_descriptor(t+i, sha1); + if (i > 0 && are_same_oid(&names[i], &names[i-1])) { + /* implicitly borrow buf[i-1] inside tree_desc[i] */ + memcpy(&t[i], &t[i-1], sizeof(struct tree_desc)); + buf[i] = NULL; + } else if (i > 1 && are_same_oid(&names[i], &names[i-2])) { + /* implicitly borrow buf[i-2] inside tree_desc[i] */ + memcpy(&t[i], &t[i-2], sizeof(struct tree_desc)); + buf[i] = NULL; + } else { + const unsigned char *sha1 = NULL; + if (dirmask & 1) + sha1 = names[i].oid->hash; + buf[i] = fill_tree_descriptor(t+i, sha1); + } } bottom = switch_cache_bottom(&newinfo); -- 2.9.3