NOTE: This series builds on sg/clean-nested-repo-with-ignored, as it (among other things) modifies his testcase from expect_failure to expect_success. Also, Peff is probably the only one who remembers v1 (and even he may have forgotten it): v1 was posted a year and a half ago. This patch series fixes a few issues with git-clean: * Failure to clean when multiple pathspecs are specified, reported both in April 2018[1] and again in May 2019[2]. * Failure to preserve both tracked and untracked files within a nested Git repository reported a few weeks ago by SZEDER[3]. [1] https://public-inbox.org/git/20180405173446.32372-4-newren@xxxxxxxxx/ [2] https://public-inbox.org/git/20190531183651.10067-1-rafa.almas@xxxxxxxxx/ [3] https://public-inbox.org/git/20190825185918.3909-1-szeder.dev@xxxxxxxxx/ Changes since v2: * Removed the RFC label * Fixed up a few things SZEDER pointed out in v2 -- some commit message improvements, and an extra safety precaution for the final patch. Stuff I'd like reviewer to focus on: * Read the (long) commit messages for patches 9 & 10; do folks agree with my declaration of "correct" behavior and my changes to the few regression tests in those patches? Elijah Newren (12): t7300: add testcases showing failure to clean specified pathspecs dir: fix typo in comment dir: fix off-by-one error in match_pathspec_item dir: also check directories for matching pathspecs dir: make the DO_MATCH_SUBMODULE code reusable for a non-submodule case dir: if our pathspec might match files under a dir, recurse into it dir: add commentary explaining match_pathspec_item's return value git-clean.txt: do not claim we will delete files with -n/--dry-run clean: disambiguate the definition of -d clean: avoid removing untracked files in a nested git repository clean: rewrap overly long line clean: fix theoretical path corruption Documentation/git-clean.txt | 16 +++++----- builtin/clean.c | 15 +++++++-- dir.c | 63 +++++++++++++++++++++++++++---------- dir.h | 8 +++-- t/t7300-clean.sh | 44 +++++++++++++++++++++++--- 5 files changed, 112 insertions(+), 34 deletions(-) Range-diff: 1: 82328e2033 ! 1: fe35ab8cc3 t7300: Add some testcases showing failure to clean specified pathspecs @@ Metadata Author: Elijah Newren <newren@xxxxxxxxx> ## Commit message ## - t7300: Add some testcases showing failure to clean specified pathspecs + t7300: add testcases showing failure to clean specified pathspecs Someone brought me a testcase where multiple git-clean invocations were required to clean out unwanted files: 2: 5c1f58fd9d = 2: 707d287d79 dir: fix typo in comment 3: 0e8b415af3 = 3: bb316e82b2 dir: fix off-by-one error in match_pathspec_item 4: 30b3ede443 ! 4: 56319f934a dir: Directories should be checked for matching pathspecs too @@ Metadata Author: Elijah Newren <newren@xxxxxxxxx> ## Commit message ## - dir: Directories should be checked for matching pathspecs too + dir: also check directories for matching pathspecs Even if a directory doesn't match a pathspec, it is possible, depending on the precise pathspecs, that some file underneath it might. So we 5: bab01f4cda ! 5: 81593a565c dir: Make the DO_MATCH_SUBMODULE code reusable for a non-submodule case @@ Metadata Author: Elijah Newren <newren@xxxxxxxxx> ## Commit message ## - dir: Make the DO_MATCH_SUBMODULE code reusable for a non-submodule case + dir: make the DO_MATCH_SUBMODULE code reusable for a non-submodule case The specific checks done in match_pathspec_item for the DO_MATCH_SUBMODULE case are useful for other cases which have nothing to do with submodules. 6: c619ab4b3e ! 6: 9566823a0f dir: If our pathspec might match files under a dir, recurse into it @@ Metadata Author: Elijah Newren <newren@xxxxxxxxx> ## Commit message ## - dir: If our pathspec might match files under a dir, recurse into it + dir: if our pathspec might match files under a dir, recurse into it For git clean, if a directory is entirely untracked and the user did not specify -d (corresponding to DIR_SHOW_IGNORED_TOO), then we usually do @@ Commit message the given directory which match one of the pathspecs being added to the entries list. - Two particular considerations for this patch: + Two notes of potential interest to future readers: - * If we want to only recurse into a directory when it is specifically + * If we wanted to only recurse into a directory when it is specifically matched rather than matched-via-glob (e.g. '*.c'), then we could do so via making the final non-zero return in match_pathspec_item be MATCHED_RECURSIVELY instead of MATCHED_RECURSIVELY_LEADING_PATHSPEC. - (See final patch of this RFC series for details; note that the - relative order of MATCHED_RECURSIVELY_LEADING_PATHSPEC and - MATCHED_RECURSIVELY are important for such a change.)) + (Note that the relative order of MATCHED_RECURSIVELY_LEADING_PATHSPEC + and MATCHED_RECURSIVELY are important for such a change.) I was + leaving open that possibility while writing an RFC asking for the + behavior we want, but even though we don't want it, that knowledge + might help you understand the code flow better. * There is a growing amount of logic in read_directory_recursive() for - deciding whether to recurse into a subdirectory. However, there is - a comment immediately preceding this logic that says to recurse if + deciding whether to recurse into a subdirectory. However, there is a + comment immediately preceding this logic that says to recurse if instructed by treat_path(). It may be better for the logic in - read_directory_recursive() to be moved to treat_path() (or another - function it calls, such as treat_directory()), but I did not feel - strongly about this and just left the logic where it was while - adding to it. Do others have strong opinions on this? + read_directory_recursive() to ultimately be moved to treat_path() (or + another function it calls, such as treat_directory()), but I have + left that for someone else to tackle in the future. Signed-off-by: Elijah Newren <newren@xxxxxxxxx> 7: 5e1686a30e = 7: 7821898ba7 dir: add commentary explaining match_pathspec_item's return value 8: d92637f961 = 8: 13def5df57 git-clean.txt: do not claim we will delete files with -n/--dry-run 9: 4550a30df8 = 9: e6b274abf7 clean: disambiguate the definition of -d 10: 0985be2793 ! 10: 5f4ef14765 clean: avoid removing untracked files in a nested git repository @@ Commit message Also, clean up some grammar errors describing this functionality in the git-clean manpage. - Finally, there is one somewhat related bug which this patch does not - fix, coming from the opposite angle. If the user runs - git clean -ffd - to force deletion of untracked nested repositories, and within an - untracked nested repo the user has ignored files (according to the inner - OR outer repositories' .gitignore), then not only will those ignored - files be left alone but the .git/ subdirectory of the nested repo will - be left alone too. I am not completely sure if this should be - considered a bug (though it seems like it since the lack of the - untracked file would result in the .git/ subdirectory being deleted), - but in any event it is very minor compared to accidentally deleting user - data and I did not dive into it. + Finally, there are still a couple bugs with -ffd not cleaning out enough + (e.g. missing the nested .git) and with -ffdX possibly cleaning out the + wrong files (paying attention to outer .gitignore instead of inner). + This patch does not address these cases at all (and does not change the + behavior relative to those flags), it only fixes the handling when given + a single -f. See + https://public-inbox.org/git/20190905212043.GC32087@xxxxxxxxxx/ for more + discussion of the -ffd[X?] bugs. Signed-off-by: Elijah Newren <newren@xxxxxxxxx> 11: 2d36e3f7cb = 11: 4e30e62eb1 clean: rewrap overly long line 12: 3c5d1ff16a < -: ---------- clean: fix theoretical path corruption -: ---------- > 12: de2444f7cb clean: fix theoretical path corruption -- 2.23.0.173.gad11b3a635.dirty