Hi Tao, thank you for chiming in! It is good to see that more people are dabbling with the built-in FSMonitor. On Thu, 10 Jun 2021, Tao Klerks wrote: > With the new "core.useBuiltinFSMonitor" support in the Windows > installer, I think this subject is worth calling out explicitly (and > my apologies if I missed prior discussion): > > TL;DR: > - I believe "core.untrackedcache" should be enabled by default in > Windows (and it does not appear to be) Stolee indicated something similar that matches your observation: https://lore.kernel.org/git/af7a671c-fa32-6d9a-7d75-65582fdbcf24@xxxxxxxxx/ Interestingly, the untracked cache extension makes a big difference here. The performance of the overall behavior is much faster if the untracked cache exists (when paired with the builtin FS Monitor; it doesn't make a significant difference when FS Monitor is disabled). > - If a user enables "core.useBuiltinFSMonitor" (eg in the installer) > in the hopes of getting snappy "git status" on a repo with a large > deep working tree, they will be *unnecessarily* disappointed if > "core.untrackedcache" is not enabled Yes. And. Unfortunately, there is an "and". I recently got a chance to work with the Functional Tests of Scalar ("an opinionated repository management tool" on top of Git, see https://github.com/microsoft/scalar#readme for more details). Essentially, you can think of that test suite as integration tests for Git in a large-scale context. And there, I ran into trouble with the untracked cache on Windows (where it really provides the most benefit). The gist of it is that _sometimes_, the mtime of a directory seems not to be updated immediately after an item in it was modified/added/deleted. And that mtime is precisely what the untracked cache depends on. The funny thing is: while the output of `git status` will therefore at first fail to pick up on, say, a new untracked file, running `git status` _immediately_ afterwards _will succeed_ to see that untracked file. So there is something fishy going on with updating things (it might even be a foul interaction between the FSCache and the untracked cache, but I have no evidence to back that up or to disprove it). It is one of my big TODOs to look into that. If you have any insights, or time to investigate, I woud be really interested. > - There is also a lingering "problem" with "git status -uall", with > both "core.useBuiltinFSMonitor" and "core.fsmonitor", but that seems > far less trivial to address Interesting. I guess the untracked cache might become too clunky with many untracked files? Or is there something else going on? > Detail: > > I just started testing the new "core.useBuiltinFSMonitor" option in > the new installer, and it's amazing, thanks Ben, Alex, Johannes and > Kevin! Not to forget Jeff Hostetler, who essentially spent the past half year on it on his own. > However, when I first enabled it, I was getting slightly *worse* git > status times than without it... and those worse git status times were > accompanied by a message along the lines of: > --- > It took 5.88 seconds to enumerate untracked files. 'status -uno' may > speed it up, but you have to be careful not to forget to add new files > yourself (see 'git help status'). > --- > > For context, this is in a repo with 200,000 or so files, within 40,000 > folders (avg path depth 4 I think?), with a reasonably-intricate set > of .gitignore patterns. Obviously that's not "your average user", but > I would imagine it matches "the target audience for > 'core.useBuiltinFSMonitor'" pretty well. Right. I had a somewhat similar setup, with Git for Windows' SDK, which consists of ~160k files in ~8k directories. My `.gitignore` consists of only ~40 heavily commented lines (containing five lines with wildcards), but I do have a `.git/info/exclude` that contains a set of generated file/directory lists, i.e. without any wildcards. This `exclude` file is ~26k lines long. A cold-cache `git status` takes ~24sec, a warm-cache one ~10sec (with the built-in FSMonitor daemon now active). My guess is that the amount of work to match the untracked vs ignored files is dominating the entire operation, by a lot. > After a little head-scratching, I recalled an exchange with Johannes > from last year: > https://lore.kernel.org/git/CAPMMpohJicVeCaKsPvommYbGEH-D1V02TTMaiVTV8ux+9z9vkQ@xxxxxxxxxxxxxx/ > > I never did understand the relevant code paths in much detail, but the > practical conclusions were: > - Without "core.untrackedcache" enabled, git ends up iterating > through the entire path structure of the working tree *even if > "core.fsmonitor" (and now "core.useBuiltinFSMonitor") is enabled*, > looking for untracked files to report > - Even with "core.untrackedcache" enabled, if "core.fsmonitor" (and > now "core.useBuiltinFSMonitor") is enabled, git iterates through the > entire path structure of the working tree *single-threaded* when the > "--untracked-files" mode is set to "all" (by config or command-line) > > Now, I imagine that addressing/improving these behaviors is very > non-trivial, but the impact could be reasonably limited if: > - core.untrackedcache were defaulted to "true", at least under > Windows, at least when the installer is asked to set > core.useBuiltinFSMonitor As soon as I can fix the flakiness of the untracked cache on Windows, I will do that! > - The "It took N.NN seconds to enumerate untracked files" message > were to include a hint about core.untrackedcache, at least when the > "--untracked-files" mode is set to "normal". > > Final note: I personally would love to see "core.useBuiltinFSMonitor > actually makes things slower, when --untracked-files=all is specified" > behavior be addressed, Yes, we need to spend some quality time with some perf tools there. > because common windows git integrations or front-ends like Git > Extensions or IntelliJ IDEA commonly use those options, and therefore > "suffer" a performance degradation on at least some operations when > core.useBuiltinFSMonitor is enabled. > > I don't know whether this is the right place to report Windows-centric > concerns, if not, my apologies. I would not necessarily call them "Windows-centric", even if yes, at the moment the built-in FSMonitor is most easily enabled on Windows (because I added that experimental option in Git for Windows' installer, after integrating the experimental feature). Instead, I consider this more the type of feedback concerning large worktrees, and what Git can do to support that use case better. In particular the built-in FSMonitor, which already supports Windows and macOS, and hopefully we will find volunteers to work on the Linux side soon, too. In my mind, the built-in FSMonitor, the untracked cache, and `git maintenance` are _crucial_ tools to allow Git to scale up. So: thank you for your wonderful feedback! Ciao, Dscho