Am 25.10.21 um 23:07 schrieb Matheus Tavares: > These three commands recently learned to avoid updating paths outside > the sparse checkout even if they are missing the SKIP_WORKTREE bit. This > is done using path_in_sparse_checkout(), which checks whether a given > path matches the current list of sparsity rules, similar to what > clear_ce_flags() does when we run "git sparse checkout init" or "git > sparse-checkout reapply". However, clear_ce_flags() uses a recursive > approach, applying the match results from parent directories on paths > that get the UNDECIDED result, whereas path_in_sparse_checkout() only > attempts to match the full path and immediately considers UNDECIDED as > NOT_MATCHED. This makes the function miss matches with leading > directories. For example, if the user has the sparsity patterns "!/a" > and "b/", add, rm, and mv will fail to update the path "a/b/c" and end > up displaying a warning about it being outside the sparse checkout even > though it isn't. This problem only occurs in full pattern mode as the > pattern matching functions never return UNDECIDED for cone mode. > > To fix this, replicate the recursive behavior of clear_ce_flags() in > path_in_sparse_checkout(), falling back to the parent directory match > when a path gets the UNDECIDED result. (If this turns out to be too > expensive in some cases, we may want to later add some form of caching > to accelerate multiple queries within the same directory. This is not > implemented in this patch, though.) Also add two tests for each affected > command (add, rm, and mv) to check that they behave correctly with the > recursive pattern matching. The first test would previously fail without > this patch while the second already succeeded. It is added mostly to > make sure that we are not breaking the existing pattern matching for > directories that are really sparse, and also as a protection against any > future regressions. > > Two other existing tests had to be changed as well: one test in t3602 > checks that "git rm -r <dir>" won't remove sparse entries, but it didn't > allow the non-sparse entries inside <dir> to be removed. The other one, > in t7002, tested that "git mv" would correctly display a warning message > for sparse paths, but it accidentally expected the message to include > two non-sparse paths as well. > > Signed-off-by: Matheus Tavares <matheus.bernardino@xxxxxx> > --- > > Changes since RFC/v1 [1]: > > - Inverted the loop direction to start from the full path and go backwards in > the parent dirs. This way we can stop early when we find the first > non-UNDECIDED match result. > > - Simplified the implementation by unifing the code path for cone mode and > full pattern mode. Since path_matches_pattern_list() never returns UNDECIDED > for cone mode, it will always execute only one iteration of the loop and then > find the final answer. There is no need to handle this case in a separate > block. > > - Inside the loop, made sure to change dtype to DT_DIR when going to parent > directories. Without this, the pattern match would fail if we had a path > like "a/b/c" and the pattern "b/" (with trailing slash). > > - Changed the tests to use trailing slash to make sure they cover the corner > case described above. > > - Improved commit message. > > [1]: https://lore.kernel.org/git/80b5ba61861193daf7132aa64b65fc7dde90dacb.1634866698.git.matheus.bernardino@xxxxxx > (The RFC was deep down another thread, so I separated v2 to help > readers. Please, let me know if that is not a good approach and I will > avoid it in the future.) > > dir.c | 25 +++++++++++++++++------ > t/t3602-rm-sparse-checkout.sh | 37 +++++++++++++++++++++++++++++++--- > t/t3705-add-sparse-checkout.sh | 18 +++++++++++++++++ > t/t7002-mv-sparse-checkout.sh | 24 ++++++++++++++++++++-- > 4 files changed, 93 insertions(+), 11 deletions(-) > > diff --git a/dir.c b/dir.c > index a4306ab874..248f72e732 100644 > --- a/dir.c > +++ b/dir.c > @@ -1504,8 +1504,9 @@ static int path_in_sparse_checkout_1(const char *path, > struct index_state *istate, > int require_cone_mode) > { > - const char *base; > int dtype = DT_REG; > + enum pattern_match_result match = UNDECIDED; > + const char *end, *slash; > > /* > * We default to accepting a path if there are no patterns or > @@ -1516,11 +1517,23 @@ static int path_in_sparse_checkout_1(const char *path, > !istate->sparse_checkout_patterns->use_cone_patterns)) > return 1; > > - base = strrchr(path, '/'); > - return path_matches_pattern_list(path, strlen(path), base ? base + 1 : path, > - &dtype, > - istate->sparse_checkout_patterns, > - istate) > 0; > + /* > + * If UNDECIDED, use the match from the parent dir (recursively), > + * or fall back to NOT_MATCHED at the topmost level. > + */ > + for (end = path + strlen(path); end > path && match == UNDECIDED; end = slash) { > + > + for (slash = end - 1; slash >= path && *slash != '/'; slash--) > + ; /* do nothing */ slash can end up one less than path. If path points to the first char of a string object this would be undefined if I read 6.5.6 of C99 correctly. (A pointer to the array element just after the last one is specified as fine as long as it's not dereferenced, but a pointer to the element before the first one is not mentioned and thus undefined.) Do you really need the ">=" instead of ">"? > + > + match = path_matches_pattern_list(path, end - path, > + slash >= path ? slash + 1 : path, &dtype, > + istate->sparse_checkout_patterns, istate); > + > + /* We are going to match the parent dir now */ > + dtype = DT_DIR; > + } > + return match > 0; > } > > int path_in_sparse_checkout(const char *path,