Re: Slow git add . performance in large repo

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2025-03-17 at 18:53:10, Yissachar Radcliffe wrote:
> We have a relatively large git repo and have noticed that `git add .`
> operations are slow (~1.5-2s). We have core.fsmonitor and
> core.untrackedCache set to true and `git status` executes in ~300ms.
> When I turn on trace2 I can see that almost all the time is spent in
> read_directo and it's visiting 26960 directories and 77989 paths.
> 
> I can use `git add <foo>` or `git add -u .` to speed things up but
> `git add .` is the most convenient for us. I created a small script to
> pipe the results of `git status` to `git add` and that runs in <500ms.
> This leaves me confused as to why the built-in performance is so slow.

What you're asking for with those commands is different.  `git add -u .`
says, "Please enumerate only those files that are in the index, and if
they are modified or removed, update the index."  `git add .` says,
"Please enumerate every file in the working tree recursively and
determine if there are any non-ignored changes, and then update the
index."  (Note that a file that matches an ignore pattern but is already
tracked is not ignored, which affects the performance here.)

Notably, the former does not add new files that are untracked, but the
latter does.  That means that the code needs to know if there are any
new untracked files.  The untracked cache is not used when you specify
a pathspec on the command line because in the general case, it doesn't
have to be just `.` and it could be something like a match on an
attribute or a glob pattern, which would make the code very complex in
dealing with that case.  It is, however, used when you _don't_ specify a
pathspec (such as `git add -u`), as well as for `git status`, since
those operate on the whole tree without any pathspecs.

When you pipe the results of `git status` to `git add`, you are
effectively using the `-u` option, since that will only ever list files
that are tracked.

I realize `git add .` is very convenient, but it does ask to do
substantially more work than `git add -u` (which I use quite
frequently), and so it can definitely perform worse, especially in
large repositories.  You can, of course, continue to use it, but you
can't expect them to perform identically.  My recommendation would be to
use `git add -u` unless you need to add new files, since that's going to
perform better.  Once you get used to it, it's pretty easy to use.
-- 
brian m. carlson (they/them or he/him)
Toronto, Ontario, CA

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux