Make it possible for users of the -uall flag to git status, either by preference or by need (eg UI tooling), to benefit from the untracked cache when they set their 'status.showuntrackedfiles' config setting to 'all'. This is especially useful for large repos in Windows, where without untracked cache "git status" times can easily be 5x higher, and GUI tooling typically does use -uall. In this sixth version, clarify the main commit message and some of the code comments, and adjust the logic slightly such that an existing untracked cache structure, when consistent with the requested flags in the current run, can be reused even though the current config would set/store/use the other set of flags on a new untracked cache structure. I'm comfortable with this patch as-is, but am still interested in any thoughts as to whether it makes sense and is likely to be accepted to do this as a simple enhancement as proposed here, or whether people be more comfortable with a new configuration option, given the potential for worse performance under specific (and, I believe, vanishingly rare) circumstances. Tao Klerks (2): untracked-cache: test untracked-cache-bypassing behavior with -uall untracked-cache: support '--untracked-files=all' if configured dir.c | 88 ++++++++++++++++++----- t/t7063-status-untracked-cache.sh | 113 ++++++++++++++++++++++++++++++ 2 files changed, 185 insertions(+), 16 deletions(-) base-commit: 805e0a68082a217f0112db9ee86a022227a9c81b Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-985%2FTaoK%2Ftaok-untracked-cache-with-uall-v6 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-985/TaoK/taok-untracked-cache-with-uall-v6 Pull-Request: https://github.com/gitgitgadget/git/pull/985 Range-diff vs v5: 1: 98a4d8f35c5 ! 1: f27f018493a untracked-cache: test untracked-cache-bypassing behavior with -uall @@ Commit message '--untracked-files=normal', and it gets ignored when '--untracked-files=all' is specified instead. - Add explicit tests for this behavior. + Add explicit tests for this known as-designed behavior. Signed-off-by: Tao Klerks <tao@xxxxxxxxxx> @@ t/t7063-status-untracked-cache.sh: test_expect_success 'untracked cache after se +?? three +EOF + -+# Bypassing the untracked cache here is not desirable, but it expected -+# in the current implementation ++# Bypassing the untracked cache here is not desirable from an ++# end-user perspective, but is expected in the current design. ++# The untracked cache data stored for a -unormal run cannot be ++# correctly used in a -uall run - it would yield incorrect output. +test_expect_success 'untracked cache is bypassed with -uall' ' + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ 2: f60d2c6e36c ! 2: 0095ec49c5f untracked-cache: support '--untracked-files=all' if configured @@ Commit message untracked-cache: support '--untracked-files=all' if configured Untracked cache was originally designed to only work with - '--untracked-files=normal', but this causes performance issues for UI - tooling that wants to see "all" on a frequent basis. On the other hand, - the conditions that prevented applicability to the "all" mode no - longer seem to apply. + "--untracked-files=normal", and is bypassed when + "--untracked-files=all" is requested, but this causes performance + issues for UI tooling that wants to see "all" on a frequent basis. - The disqualification of untracked cache has a particularly significant - impact on Windows with the defaulted fscache, where the introduction - of fsmonitor can make the first and costly directory-iteration happen - in untracked file detection, single-threaded, rather than in - preload-index on multiple threads. Improving the performance of a - "normal" 'git status' run with fsmonitor can make - 'git status --untracked-files=all' perform much worse. + On the other hand, the conditions that altogether prevented + applicability to the "all" mode no longer seem to apply, after + several major refactors in recent years; this possibility was + discussed in + 81153d02-8e7a-be59-e709-e90cd5906f3a@xxxxxxxxxxxxxxxxx and + CABPp-BFiwzzUgiTj_zu+vF5x20L0=1cf25cHwk7KZQj2YkVzXw@xxxxxxxxxxxxxx, + and somewhat confirmed experimentally by several users using a + version of this patch to use untracked cache with -uall for about a + year. + + When 'git status' runs without using the untracked cache, on a large + repo, on windows, with fsmonitor, it can run very slowly. This can + make GUIs that need to use "-uall" (and therefore currently bypass + untracked cache) unusable when fsmonitor is enabled, on such large + repos. To partially address this, align the supported directory flags for the stored untracked cache data with the git config. If a user specifies @@ dir.c: static void new_untracked_cache(struct index_state *istate) } } } -@@ dir.c: void remove_untracked_cache(struct index_state *istate) - } - - static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *dir, -- int base_len, -- const struct pathspec *pathspec, -- struct index_state *istate) -+ int base_len, -+ const struct pathspec *pathspec, -+ struct index_state *istate) - { - struct untracked_cache_dir *root; - static int untracked_cache_disabled = -1; @@ dir.c: static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d if (base_len || (pathspec && pathspec->nr)) return NULL; @@ dir.c: static struct untracked_cache_dir *validate_untracked_cache(struct dir_st return NULL; } -+ /* -+ * We don't support using or preparing the untracked cache if -+ * the current effective flags don't match the configured -+ * flags. -+ */ -+ if (dir->flags != new_untracked_cache_flags(istate)) -+ return NULL; -+ + /* + * If the untracked structure we received does not have the same flags -+ * as configured, but the configured flags do match the effective flags, -+ * then we need to reset / create a new "untracked" structure to match -+ * the new config. -+ * Keeping the saved and used untracked cache in-line with the -+ * configuration provides an opportunity for frequent users of -+ * "git status -uall" to leverage the untracked cache by aligning their -+ * configuration (setting "status.showuntrackedfiles" to "all" or -+ * "normal" as appropriate), where previously this option was -+ * incompatible with untracked cache and *consistently* caused -+ * surprisingly bad performance (with fscache and fsmonitor enabled) on -+ * Windows. -+ * -+ * IMPROVEMENT OPPORTUNITY: If we reworked the untracked cache storage -+ * to not be as bound up with the desired output in a given run, -+ * and instead iterated through and stored enough information to -+ * correctly serve both "modes", then users could get peak performance -+ * with or without '-uall' regardless of their -+ * "status.showuntrackedfiles" config. ++ * as requested in this run, we're going to need to either discard the ++ * existing structure (and potentially later recreate), or bypass the ++ * untracked cache mechanism for this run. + */ + if (dir->flags != dir->untracked->dir_flags) { -+ free_untracked_cache(istate->untracked); -+ new_untracked_cache(istate, dir->flags); -+ dir->untracked = istate->untracked; ++ /* ++ * If the untracked structure we received does not have the same flags ++ * as configured, then we need to reset / create a new "untracked" ++ * structure to match the new config. ++ * ++ * Keeping the saved and used untracked cache consistent with the ++ * configuration provides an opportunity for frequent users of ++ * "git status -uall" to leverage the untracked cache by aligning their ++ * configuration - setting "status.showuntrackedfiles" to "all" or ++ * "normal" as appropriate. ++ * ++ * Previously using -uall (or setting "status.showuntrackedfiles" to ++ * "all") was incompatible with untracked cache and *consistently* ++ * caused surprisingly bad performance (with fscache and fsmonitor ++ * enabled) on Windows. ++ * ++ * IMPROVEMENT OPPORTUNITY: If we reworked the untracked cache storage ++ * to not be as bound up with the desired output in a given run, ++ * and instead iterated through and stored enough information to ++ * correctly serve both "modes", then users could get peak performance ++ * with or without '-uall' regardless of their ++ * "status.showuntrackedfiles" config. ++ */ ++ if (dir->untracked->dir_flags != new_untracked_cache_flags(istate)) { ++ free_untracked_cache(istate->untracked); ++ new_untracked_cache(istate, dir->flags); ++ dir->untracked = istate->untracked; ++ } ++ else { ++ /* ++ * Current untracked cache data is consistent with config, but not ++ * usable in this request/run; just bypass untracked cache. ++ */ ++ return NULL; ++ } + } + if (!dir->untracked->root) { @@ t/t7063-status-untracked-cache.sh: test_expect_success 'untracked cache remains + test_cmp ../trace.expect ../trace.relevant +' + -+# Bypassing the untracked cache here is not desirable, but it expected -+# in the current implementation ++# Bypassing the untracked cache here is not desirable from an ++# end-user perspective, but is expected in the current design. ++# The untracked cache data stored for a -all run cannot be ++# correctly used in a -unormal run - it would yield incorrect ++# output. +test_expect_success 'if -uall is configured, untracked cache is bypassed with -unormal' ' + test_config status.showuntrackedfiles all && + : >../trace.output && -- gitgitgadget