[PATCH v2] untracked-cache: support '--untracked-files=all' if configured

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Tao Klerks <tao@xxxxxxxxxx>

Untracked cache was originally designed to only work with
'-untracked-files=normal', but this is a concern for UI tooling that
wants to see "all" on a frequent basis, and the conditions that
prevented applicability to the "all" mode no longer seem to apply.

The disqualification of untracked cache is a particular problem on
Windows with the defaulted fscache, where the introduction of
fsmonitor can make the first and costly directory-iteration happen
in untracked file detection, single-threaded, rather than in
preload-index on multiple threads. Improving the performance of a
"normal" 'git status' run with fsmonitor can make
'git status --untracked-files=all' perform much worse.

In this change, align the supported directory flags for the stored
untracked cache data with the git config. If a user specifies an
'--untracked-files=' commandline parameter that does not align with
their 'status.showuntrackedfiles' config value, then the untracked
cache will be ignored - as it is for other unsupported situations like
when a pathspec is specified.

If the previously stored flags no longer match the current
configuration, but the currently-applicable flags do match the current
configuration, then the previously stored untracked cache data is
discarded.

For most users there will be no change in behavior. Users who need
'--untracked-files=all' to perform well will have the option of
setting "status.showuntrackedfiles" to "all".

Users who need '--untracked-files=all' to perform well for their
tooling AND prefer to avoid the verbosity of "all" when running
git status explicitly without options... are out of luck for now (no
change).

Users who set "status.showuntrackedfiles" to "all" and yet most
frequently explicitly call 'git status --untracked-files=normal' (and
use the untracked cache) are the only ones who would be disadvantaged
by this change. It seems unlikely there are any such users.

Signed-off-by: Tao Klerks <tao@xxxxxxxxxx>
---
    Support untracked cache with '--untracked-files=all' if configured
    
    Resending this patch from a few months ago (with better standards
    compliance) - it hasn't seen any comments yet, I would dearly love some
    eyes on this as the change can make a big difference to hundreds of
    windows users in my environment (and I'd really rather not start
    distributing customized git builds!)
    
    This patch aims to make it possible for users of the -uall flag to git
    status, either by preference or by need (eg UI tooling), to benefit from
    the untracked cache when they set their 'status.showuntrackedfiles'
    config setting to 'all'. This is very important for large repos in
    Windows.
    
    More detail on the change and context in the commit message, I assume
    repeating a verbose message here is discouraged.
    
    These changes result from a couple of conversations,
    81153d02-8e7a-be59-e709-e90cd5906f3a@xxxxxxxxxxxxxxxxx and
    CABPp-BFiwzzUgiTj_zu+vF5x20L0=1cf25cHwk7KZQj2YkVzXw@xxxxxxxxxxxxxx>.
    
    The test suite passes, but as a n00b I do have questions:
    
     * Is there any additional validation that could/should be done to
       confirm that "-uall" untracked data can be cached safely?
    
    ** It looks safe from following the code ** It seems from discussing
    briefly with Elijah Newren in the thread above that thare are no obvious
    red flags ** Manual testing, explicitly and implicitly through months of
    use, yields no issues
    
     * Is it OK to check the repo configuration in the body of processing?
       It seems to be a rare pattern
     * Can anyone think of a way to test this change?

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-985%2FTaoK%2Ftaok-untracked-cache-with-uall-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-985/TaoK/taok-untracked-cache-with-uall-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/985

Range-diff vs v1:

 1:  2797efad9a4 ! 1:  e2f1ad26c78 Support untracked cache with '--untracked-files=all' if configured
     @@ Metadata
      Author: Tao Klerks <tao@xxxxxxxxxx>
      
       ## Commit message ##
     -    Support untracked cache with '--untracked-files=all' if configured
     +    untracked-cache: support '--untracked-files=all' if configured
      
          Untracked cache was originally designed to only work with
     -    '-untracked-files=normal', but this is a concern for UI tooling
     -    that wants to see "all" on a frequent basis, and the conditions
     -    that prevented applicability to the "all" mode no longer seem to
     -    apply.
     +    '-untracked-files=normal', but this is a concern for UI tooling that
     +    wants to see "all" on a frequent basis, and the conditions that
     +    prevented applicability to the "all" mode no longer seem to apply.
      
          The disqualification of untracked cache is a particular problem on
          Windows with the defaulted fscache, where the introduction of
     @@ Commit message
          "normal" 'git status' run with fsmonitor can make
          'git status --untracked-files=all' perform much worse.
      
     -    Here we align the supported directory flags for the stored
     +    In this change, align the supported directory flags for the stored
          untracked cache data with the git config. If a user specifies an
     -    '--untracked-files=' commandline parameter that does not align
     -    with their 'status.showuntrackedfiles' config value, then the
     -    untracked cache will be ignored - as it is for other unsupported
     -    situations like when a pathspec is specified.
     +    '--untracked-files=' commandline parameter that does not align with
     +    their 'status.showuntrackedfiles' config value, then the untracked
     +    cache will be ignored - as it is for other unsupported situations like
     +    when a pathspec is specified.
      
          If the previously stored flags no longer match the current
     -    configuration, but the currently-applicable flags do match the
     -    current configuration, then the previously stored untracked cache
     -    data is discarded.
     +    configuration, but the currently-applicable flags do match the current
     +    configuration, then the previously stored untracked cache data is
     +    discarded.
      
     -    For most users there will be no change in behavior. Users who
     -    need '--untracked-files=all' to perform well will have the option
     -    of setting "status.showuntrackedfiles" to "all".
     +    For most users there will be no change in behavior. Users who need
     +    '--untracked-files=all' to perform well will have the option of
     +    setting "status.showuntrackedfiles" to "all".
      
          Users who need '--untracked-files=all' to perform well for their
          tooling AND prefer to avoid the verbosity of "all" when running
     -    git status explicitly... are out of luck for now.
     +    git status explicitly without options... are out of luck for now (no
     +    change).
      
     -    Users who set "status.showuntrackedfiles" to "all" and yet
     -    most frequently explicitly call
     -    'git status --untracked-files=normal' (and use the untracked
     -    cache) are the only users who will be explicitly disadvantaged
     -    by this change. These should be vanishingly rare, if there are
     -    any at all.
     +    Users who set "status.showuntrackedfiles" to "all" and yet most
     +    frequently explicitly call 'git status --untracked-files=normal' (and
     +    use the untracked cache) are the only ones who would be disadvantaged
     +    by this change. It seems unlikely there are any such users.
      
          Signed-off-by: Tao Klerks <tao@xxxxxxxxxx>
      


 dir.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 68 insertions(+), 18 deletions(-)

diff --git a/dir.c b/dir.c
index d91295f2bcd..e35331d3f71 100644
--- a/dir.c
+++ b/dir.c
@@ -2746,13 +2746,33 @@ static void set_untracked_ident(struct untracked_cache *uc)
 	strbuf_addch(&uc->ident, 0);
 }
 
-static void new_untracked_cache(struct index_state *istate)
+static unsigned configured_default_dir_flags(struct index_state *istate)
+{
+	/* This logic is coordinated with the setting of these flags in
+	 * wt-status.c#wt_status_collect_untracked(), and the evaluation
+	 * of the config setting in commit.c#git_status_config()
+	 */
+	char *status_untracked_files_config_value;
+	int config_outcome = repo_config_get_string(istate->repo,
+								"status.showuntrackedfiles",
+								&status_untracked_files_config_value);
+	if (!config_outcome && !strcmp(status_untracked_files_config_value, "all")) {
+		return 0;
+	} else {
+		/*
+		 * The default, if "all" is not set, is "normal" - leading us here.
+		 * If the value is "none" then it really doesn't matter.
+		 */
+		return DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
+	}
+}
+
+static void new_untracked_cache(struct index_state *istate, unsigned flags)
 {
 	struct untracked_cache *uc = xcalloc(1, sizeof(*uc));
 	strbuf_init(&uc->ident, 100);
 	uc->exclude_per_dir = ".gitignore";
-	/* should be the same flags used by git-status */
-	uc->dir_flags = DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
+	uc->dir_flags = flags;
 	set_untracked_ident(uc);
 	istate->untracked = uc;
 	istate->cache_changed |= UNTRACKED_CHANGED;
@@ -2761,11 +2781,13 @@ static void new_untracked_cache(struct index_state *istate)
 void add_untracked_cache(struct index_state *istate)
 {
 	if (!istate->untracked) {
-		new_untracked_cache(istate);
+		new_untracked_cache(istate,
+				configured_default_dir_flags(istate));
 	} else {
 		if (!ident_in_untracked(istate->untracked)) {
 			free_untracked_cache(istate->untracked);
-			new_untracked_cache(istate);
+			new_untracked_cache(istate,
+					configured_default_dir_flags(istate));
 		}
 	}
 }
@@ -2781,10 +2803,12 @@ void remove_untracked_cache(struct index_state *istate)
 
 static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *dir,
 						      int base_len,
-						      const struct pathspec *pathspec)
+						      const struct pathspec *pathspec,
+							  struct index_state *istate)
 {
 	struct untracked_cache_dir *root;
 	static int untracked_cache_disabled = -1;
+	unsigned configured_dir_flags;
 
 	if (!dir->untracked)
 		return NULL;
@@ -2812,17 +2836,9 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
 	if (base_len || (pathspec && pathspec->nr))
 		return NULL;
 
-	/* Different set of flags may produce different results */
-	if (dir->flags != dir->untracked->dir_flags ||
-	    /*
-	     * See treat_directory(), case index_nonexistent. Without
-	     * this flag, we may need to also cache .git file content
-	     * for the resolve_gitlink_ref() call, which we don't.
-	     */
-	    !(dir->flags & DIR_SHOW_OTHER_DIRECTORIES) ||
-	    /* We don't support collecting ignore files */
-	    (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
-			   DIR_COLLECT_IGNORED)))
+	/* We don't support collecting ignore files */
+	if (dir->flags & (DIR_SHOW_IGNORED | DIR_SHOW_IGNORED_TOO |
+			DIR_COLLECT_IGNORED))
 		return NULL;
 
 	/*
@@ -2845,6 +2861,40 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
 		return NULL;
 	}
 
+	/* We don't support using or preparing the untracked cache if
+	 * the current effective flags don't match the configured
+	 * flags.
+	 */
+	configured_dir_flags = configured_default_dir_flags(istate);
+	if (dir->flags != configured_dir_flags)
+		return NULL;
+
+	/* If the untracked structure we received does not have the same flags
+	 * as configured, but the configured flags do match the effective flags,
+	 * then we need to reset / create a new "untracked" structure to match
+	 * the new config.
+	 * Keeping the saved and used untracked cache in-line with the
+	 * configuration provides an opportunity for frequent users of
+	 * "git status -uall" to leverage the untracked cache by aligning their
+	 * configuration (setting "status.showuntrackedfiles" to "all" or
+	 * "normal" as appropriate), where previously this option was
+	 * incompatible with untracked cache and *consistently* caused
+	 * surprisingly bad performance (with fscache and fsmonitor enabled) on
+	 * Windows.
+	 *
+	 * IMPROVEMENT OPPORTUNITY: If we reworked the untracked cache storage
+	 * to not be as bound up with the desired output in a given run,
+	 * and instead iterated through and stored enough information to
+	 * correctly serve both "modes", then users could get peak performance
+	 * with or without '-uall' regardless of their
+	 * "status.showuntrackedfiles" config.
+	 */
+	if (dir->flags != dir->untracked->dir_flags) {
+		free_untracked_cache(istate->untracked);
+		new_untracked_cache(istate, configured_dir_flags);
+		dir->untracked = istate->untracked;
+	}
+
 	if (!dir->untracked->root)
 		FLEX_ALLOC_STR(dir->untracked->root, name, "");
 
@@ -2916,7 +2966,7 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 		return dir->nr;
 	}
 
-	untracked = validate_untracked_cache(dir, len, pathspec);
+	untracked = validate_untracked_cache(dir, len, pathspec, istate);
 	if (!untracked)
 		/*
 		 * make sure untracked cache code path is disabled,

base-commit: dab1b7905d0b295f1acef9785bb2b9cbb0fdec84
-- 
gitgitgadget



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux