Re: git rm VERY slow for directories with many files.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> writes:

> On Sun, Oct 29, 2017 at 09:51:55AM +0900, Junio C Hamano wrote:
>> "brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> writes:
>> > First, make sure your working directory is clean with no changes.  Then,
>> > remove the directory (by hand) or move it somewhere else.  Then, run
>> > "git add -u".
>> >
>> > That should allow you to commit the removal of those files quickly.
>> 
>> If get_tree_entry() shows up a lot in the profile, it would indicate
>> that a lot of cycles are spent in check_local_mod().  Bypassing it
>> with "-f" may be the first thing to try ;-)
>
> That is indeed faster.  I tested my solution by creating a directory
> with 20,000 files in a temporary repo.  git rm -r took 17.96s, and git
> rm -rf took .12s.  (This is on an SSD.)
>
> That's also a nicer and more intuitive solution than mine.

Heh, the above was meant as a joke, though.  "-f" is bypassing an
important safety valve.  In fact in my early draft of the message,
the paragraph that followed started with "Jokes aside, ..." ;-)

>> I wonder how fast "git diff-index --cached -r HEAD --", with the
>> same pathspec used for the problematic "git rm", runs in this same
>> 50,000 path project.
>
> I'll let the original poster answer this one as well, but it was very
> fast in my test repo.  I'm not very familiar with the code path in
> question, but it definitely looks like we're avoiding the quadratic
> behavior in this case.

Because of the way "diff-index --cached" iterates over the index and
the tree in parallel, it should be a lot faster than doing
get_tree_entry() for each and every path you care about.  In
addition, the "--cached" form is further optimized to take advantage
of the cached-tree index extension, so you often can tell "all index
entries in this directory are untouched" without descending into
deep subdirectories.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux