"brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> writes: > On Sun, Oct 29, 2017 at 09:51:55AM +0900, Junio C Hamano wrote: >> "brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> writes: >> > First, make sure your working directory is clean with no changes. Then, >> > remove the directory (by hand) or move it somewhere else. Then, run >> > "git add -u". >> > >> > That should allow you to commit the removal of those files quickly. >> >> If get_tree_entry() shows up a lot in the profile, it would indicate >> that a lot of cycles are spent in check_local_mod(). Bypassing it >> with "-f" may be the first thing to try ;-) > > That is indeed faster. I tested my solution by creating a directory > with 20,000 files in a temporary repo. git rm -r took 17.96s, and git > rm -rf took .12s. (This is on an SSD.) > > That's also a nicer and more intuitive solution than mine. Heh, the above was meant as a joke, though. "-f" is bypassing an important safety valve. In fact in my early draft of the message, the paragraph that followed started with "Jokes aside, ..." ;-) >> I wonder how fast "git diff-index --cached -r HEAD --", with the >> same pathspec used for the problematic "git rm", runs in this same >> 50,000 path project. > > I'll let the original poster answer this one as well, but it was very > fast in my test repo. I'm not very familiar with the code path in > question, but it definitely looks like we're avoiding the quadratic > behavior in this case. Because of the way "diff-index --cached" iterates over the index and the tree in parallel, it should be a lot faster than doing get_tree_entry() for each and every path you care about. In addition, the "--cached" form is further optimized to take advantage of the cached-tree index extension, so you often can tell "all index entries in this directory are untouched" without descending into deep subdirectories.