Re: Question about fsmonitor and --untracked-files=all

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Johannes,

Thanks for the tip - unfortunately, that doesn't seem to have worked /
had any positive effect.

With "git config core.fscache false", everything/anything takes longer
except a simple "git status" with the fsmonitor enabled and the
untrackedCache enabled (in which case I guess nothing ends up needing
the filesystem). This combination (fsmonitor enabled, untrackedCache
enabled, and running simple "git status") is the *only* combination
that I've found so far that doesn't force a directory scan - and
*when* there is a directory scan (because of "--untracked-files=all",
or because the fsmonitor is disabled, or because the untrackedCache is
disabled), then having fscache disabled makes things significantly
worse/slower (20% slower to double the time, depending on the exact
combination).

I tried to stumble my way around some of the source code, and I
suspect I've found at least one explanation: The untracked cache
appears to be ignored when "--untracked-files=all" is specified, and
this appears to be intentional:
* In wt-status.c#wt_status_collect_untracked(), the "dir.flags" are
updated to include "DIR_SHOW_OTHER_DIRECTORIES" when the
"SHOW_ALL_UNTRACKED_FILES" arg is detected
* In later logic nested in dir.c#validate_untracked_cache(), the
presence of the "DIR_SHOW_OTHER_DIRECTORIES" flag causes the
validation to fail and, up one level in read_directory(), this causes
the untracked structure to be discarded

The relevant comment in "validate_untracked_cache()" says "See
treat_directory(), case index_nonexistent. Without this
[DIR_SHOW_OTHER_DIRECTORIES] flag, we may need to also cache .git file
content for the resolve_gitlink_ref() call, which we don't.". I can't
claim to understand the comment, the relationship to gitlinks, etc :(

Does this look like something solvable? It looks like supporting the
untrackedCache even with "--untracked-files=all" would make a
(potentially) large difference to git status performance in some
workflows with fsmonitor enabled.

(all that said, I still haven't understood why the presence of the
fsmonitor hook makes the difference, in terms of behavior, between
*multi-threaded* directory tree scanning for all directory contents
(without the fsmonitor), and *single-threaded* directory scanning for
untracked files specifically (with the fsmonitor))

Thanks for looking, any further thoughts will of course be most appreciated!

Tao Klerks

On Wed, Sep 23, 2020 at 4:42 PM Johannes Schindelin
<Johannes.Schindelin@xxxxxx> wrote:
>
> Hi Tao,
>
> On Tue, 22 Sep 2020, Tao Klerks wrote:
>
> > I've got a couple questions about the "fsmonitor" functionality,
> > untracked files, and multithreading.
> >
> > Background:
> >
> > In a repo with:
> >  * A couple hundred thousand tracked files, and a couple hundred
> > thousand .gitignored files, across a few thousand directories
> >  * The --untracked-cache setting, tested and working
> >  * core.fsmonitor set up with watchman (with the sample integration
> > script from january)
> >  * Git version 2.27.0.windows.1
> >
> > "git status" takes about 2s
> > "git status --untracked-files=all" takes about 20s
> >
> > When I turn off "core.fsmonitor", the numbers change to something like:
> > "git status": 8s
> > "git status --untracked-files=all": 9s
> >
> > Using windows' "procmon" to observe git.exe's behavior from outside, I
> > think I've understood a couple things that surprise me:
> > 1. when you specify "--untracked-files=all", git scans the entire
> > folder tree regardless of the "fsmonitor" hook
> > 2. when you specify the "fsmonitor" hook, git does any
> > filesystem-scanning in a single-threaded fashion (as opposed to
> > multi-threaded without "fsmonitor" / normally)
> >
> > These two things combine so that with "fsmonitor" set, normal
> > command-line git status performance is great, but the performance in
> > tools that eagerly look for untracked files (like "Git Extensions" on
> > windows) actually suffers - it takes twice as long to run the 'git -c
> > diff.ignoreSubModules=none status --porcelain=2 -z
> > --untracked-files=all' command that this UI wants (and blocks on, when
> > you go to a commit dialog).
> >
> > Questions:
> >
> > 1. Is there a reason "--untracked-files=all" causes a full directory
> > tree scan even with the "fsmonitor" hook active, or is this
> > accidental?
>
> I have a hunch that this might be related to a performance hack we have in
> Git for Windows: did you enable FSCache perchance?
>
> If so, I _suspect_ that turning it off would accelerate `git status
> --untracked-files=all`.
>
> Ciao,
> Johannes
>
> > 2. Assuming that the full directory tree scan is indeed necessary even
> > with "fsmonitor" (when requesting all untracked files), could it be
> > made multithreaded?
> >
> > (my apologies for the simplistic "outside-in" observations; I don't
> > feel qualified to attempt to understand the git source code)
> >
> > Thanks for any help understanding the optimization opportunities here!
> >
> > Tao Klerks
> >



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux