Re: git rm VERY slow for directories with many files.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> writes:

>> Looking at an optimized profile, all the time seems to be spent in “get_tree_entry” — I assume there is some huge object representing the directory which is being re-expanded for each file?
>
> Yes, there's a tree object that represents each directory.
>
>> Is there any way I can speed up removing this directory?
>
> First, make sure your working directory is clean with no changes.  Then,
> remove the directory (by hand) or move it somewhere else.  Then, run
> "git add -u".
>
> That should allow you to commit the removal of those files quickly.

If get_tree_entry() shows up a lot in the profile, it would indicate
that a lot of cycles are spent in check_local_mod().  Bypassing it
with "-f" may be the first thing to try ;-)

The way "git rm" makes repeated calls to get_tree_entry() with deep
pathnames would be an easy recipe to get quadratic behaviour like
the one reported in the first message on this thread, as it always
goes from the root level, grabs an tree object and scans it to get
the entry for the next level, and (worse yet) a look-up of a path
component in each of these tree object must be done as a linear
scan.

I wonder how fast "git diff-index --cached -r HEAD --", with the
same pathspec used for the problematic "git rm", runs in this same
50,000 path project.  

If it runs in a reasonable time, one easy way out may be to revamp
the codepath to call check_local_mod() to:

 - first before making the call, do the "diff-index --cached" thing
   internally with the same pathspec to grab the list of paths that
   have local modifications; save the set of paths in a hashmap or
   something.

 - pass that hashmap to check_local_mod(), and where the function
   does the "staged_changes" check, consult the hashmap to see the
   path in question is different between the HEAD and the index.





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux