Junio C Hamano <gitster@xxxxxxxxx> writes: > I wonder how fast "git diff-index --cached -r HEAD --", with the > same pathspec used for the problematic "git rm", runs in this same > 50,000 path project. > > If it runs in a reasonable time, one easy way out may be to revamp > the codepath to call check_local_mod() to: > > - first before making the call, do the "diff-index --cached" thing > internally with the same pathspec to grab the list of paths that > have local modifications; save the set of paths in a hashmap or > something. > > - pass that hashmap to check_local_mod(), and where the function > does the "staged_changes" check, consult the hashmap to see the > path in question is different between the HEAD and the index. And if we want to try a more localized band-aid, another approach may be to add a caching version of get_tree_entry() where we keep track of (stack of) tree, the path component we found during the last call to the helper and the tree_desc. That way, when we get the next call, we descend that stack as long as the leading path components are still the same, and when we see that the path component we are looking for is different from what we used in the last call, we either (1) reuse the tree_desc and keep going forward if the name we looked for the last sorts before what we are looking for, or (2) discard and reopen the tree, rewinding the tree_desc to the beginning and do the scan. That way, the caller of the check_local_mod() does not have to know the trick, and because the loop in check_local_mod() iterates over the list that is already sorted in the index order, we'd not just reduce the number of times we open the trees but also reduce the number of times we scan and skip the entries in trees to find the entries we are after.