From: Elijah Newren <newren@xxxxxxxxx> I was going to title this "dir: more synchronizing of treat_leading_path() and read_directory_recursive()", a nod to commit 777b42034764 ("dir: synchronize treat_leading_path() and read_directory_recursive()", 2019-12-19), but the title was too long. Anyway, first the backstory... fill_directory() has always had a slightly error-prone interface: it returns a subset of paths which *might* match the specified pathspec; it was intended to prune away some paths which didn't match the specified pathspec and keep at least all the ones that did match it. Given this interface, callers were responsible to post-process the results and check whether each actually matched the pathspec. builtin/clean.c did this. It would first prune out duplicates (e.g. if "dir" was returned as well as all files under "dir/", then it would simplify this to just "dir"), and after pruning duplicates it would compare the remaining paths to the specified pathspec(s). This post-processing itself could run into problems, though, as noted in commit 404ebceda01c ("dir: also check directories for matching pathspecs", 2019-09-17): For the case of git-clean and a set of pathspecs of "dir/file" and "more", this caused a problem because we'd end up with dir entries for both of "dir" "dir/file" Then correct_untracked_entries() would try to helpfully prune duplicates for us by removing "dir/file" since it's under "dir", leaving us with "dir" Since the original pathspec only had "dir/file", the only entry left doesn't match and leaves nothing to be removed. (Note that if only one pathspec was specified, e.g. only "dir/file", then the common_prefix_len optimizations in fill_directory would cause us to bypass this problem, making it appear in simple tests that we could correctly remove manually specified pathspecs.) That commit fixed the issue -- when multiple pathspecs were specified -- by making sure fill_directory() wouldn't return both "dir" and "dir/file" outside the common_prefix_len optimization path. This is where it starts to get fun. In commit b9670c1f5e6b ("dir: fix checks on common prefix directory", 2019-12-19), we noticed that the common_prefix_len wasn't doing appropriate checks and letting all kinds of stuff through, resulting in recursing into .git/ directories and other craziness. So it started locking down and doing checks on pathnames within that code path. That continued with commit 777b42034764 ("dir: synchronize treat_leading_path() and read_directory_recursive()", 2019-12-19), which noted the following: Our optimization to avoid calling into read_directory_recursive() when all pathspecs have a common leading directory mean that we need to match the logic that read_directory_recursive() would use if we had just called it from the root. Since it does more than call treat_path() we need to copy that same logic. ...and then it more forcefully addressed the issue with this wonderfully ironic statement: Needing to duplicate logic like this means it is guaranteed someone will eventually need to make further changes and forget to update both locations. It is tempting to just nuke the leading_directory special casing to avoid such bugs and simplify the code, but unpack_trees' verify_clean_subdirectory() also calls read_directory() and does so with a non-empty leading path, so I'm hesitant to try to restructure further. Add obnoxious warnings to treat_leading_path() and read_directory_recursive() to try to warn people of such problems. You would think that with such a strongly worded description, that its author would have actually ensured that the logic in treat_leading_path() and read_directory_recursive() did actually match and that *everything* that was needed had at least been copied over at the time that this paragraph was written. But you'd be wrong, I messed it up by missing part of the logic. Copy the missing bits to fix the new final test in t7300. Signed-off-by: Elijah Newren <newren@xxxxxxxxx> --- dir.c | 4 ++++ t/t7300-clean.sh | 2 +- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/dir.c b/dir.c index 7d255227b1..5d4c92d3aa 100644 --- a/dir.c +++ b/dir.c @@ -2383,6 +2383,10 @@ static int treat_leading_path(struct dir_struct *dir, (dir->flags & DIR_SHOW_IGNORED_TOO || do_match_pathspec(istate, pathspec, sb.buf, sb.len, baselen, NULL, DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC)) { + if (!match_pathspec(istate, pathspec, sb.buf, sb.len, + 0 /* prefix */, NULL, + 0 /* do NOT special case dirs */)) + state = path_none; add_path_to_appropriate_result_list(dir, NULL, &cdir, istate, &sb, baselen, diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh index 782e125c89..cb5e34d94c 100755 --- a/t/t7300-clean.sh +++ b/t/t7300-clean.sh @@ -737,7 +737,7 @@ test_expect_success MINGW 'handle clean & core.longpaths = false nicely' ' test_i18ngrep "too long" .git/err ' -test_expect_failure 'clean untracked paths by pathspec' ' +test_expect_success 'clean untracked paths by pathspec' ' git init untracked && mkdir untracked/dir && echo >untracked/dir/file.txt && -- gitgitgadget