From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx> Teach add_index_entry_with_check() and has_dir_name() to see if the path of the new item is greater than the last path in the index array before attempting to search for it. During checkout, merge_working_tree() populates the new index in sorted order, so this change will save at least 2 binary lookups per file. This preserves the original behavior but simply checks the last element before starting the search. This helps performance on very large repositories. ================ Before and after numbers on index with 1M files ./p0004-read-tree.sh 0004.2: read-tree work1 (1003037) 3.21(2.54+0.62) 0004.3: switch base work1 (3038 1003037) 7.49(5.39+1.84) 0004.5: switch work1 work2 (1003037) 11.91(8.38+3.00) 0004.6: switch commit aliases (1003037) 12.22(8.30+3.06) ./p0004-read-tree.sh 0004.2: read-tree work1 (1003040) 2.40(1.65+0.73) 0004.3: switch base work1 (3041 1003040) 6.07(4.12+1.66) 0004.5: switch work1 work2 (1003040) 10.23(6.76+2.92) 0004.6: switch commit aliases (1003040) 10.53(6.97+2.83) ================ Signed-off-by: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx> --- read-cache.c | 46 ++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 44 insertions(+), 2 deletions(-) diff --git a/read-cache.c b/read-cache.c index 97f13a1..a8ef823 100644 --- a/read-cache.c +++ b/read-cache.c @@ -918,9 +918,24 @@ static int has_dir_name(struct index_state *istate, int stage = ce_stage(ce); const char *name = ce->name; const char *slash = name + ce_namelen(ce); + size_t len_eq_last; + int cmp_last = 0; + + if (istate->cache_nr > 0) { + /* + * Compare the entry's full path with the last path in the index. + * If it sorts AFTER the last entry in the index and they have no + * common prefix, then there cannot be any F/D name conflicts. + */ + cmp_last = strcmp_offset(name, + istate->cache[istate->cache_nr-1]->name, + &len_eq_last); + if (cmp_last > 0 && len_eq_last == 0) + return retval; + } for (;;) { - int len; + size_t len; for (;;) { if (*--slash == '/') @@ -930,6 +945,24 @@ static int has_dir_name(struct index_state *istate, } len = slash - name; + if (cmp_last > 0) { + /* + * If this part of the directory prefix (including the trailing + * slash) already appears in the path of the last entry in the + * index, then we cannot also have a file with this prefix (or + * any parent directory prefix). + */ + if (len+1 <= len_eq_last) + return retval; + /* + * If this part of the directory prefix (excluding the trailing + * slash) is longer than the known equal portions, then this part + * of the prefix cannot collide with a file. Go on to the parent. + */ + if (len > len_eq_last) + continue; + } + pos = index_name_stage_pos(istate, name, len, stage); if (pos >= 0) { /* @@ -1021,7 +1054,16 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e if (!(option & ADD_CACHE_KEEP_CACHE_TREE)) cache_tree_invalidate_path(istate, ce->name); - pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce)); + + /* + * If this entry's path sorts after the last entry in the index, + * we can avoid searching for it. + */ + if (istate->cache_nr > 0 && + strcmp(ce->name, istate->cache[istate->cache_nr - 1]->name) > 0) + pos = -istate->cache_nr - 1; + else + pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce)); /* existing match? Just replace it. */ if (pos >= 0) { -- 2.9.3