Re: [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory

Elijah Newren <newren@xxxxxxxxx> · Thu, 6 May 2021 22:00:49 -0700

On Thu, May 6, 2021 at 9:27 PM Eric Sunshine <sunshine@xxxxxxxxxxxxxx> wrote:
>
> On Fri, May 7, 2021 at 12:05 AM Elijah Newren via GitGitGadget
> <gitgitgadget@xxxxxxxxx> wrote:
> > PNPM is apparently creating deeply nested (but ignored) directory
> > structures; traversing them is costly performance-wise, unnecessary, and
> > in some cases is even throwing warnings/errors because the paths are too
> > long to handle on various platforms.  Add a testcase that demonstrates
> > this problem.
> >
> > Signed-off-by: Elijah Newren <newren@xxxxxxxxx>
> > ---
> > diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
> > @@ -746,4 +746,44 @@ test_expect_success 'clean untracked paths by pathspec' '
> > +test_expect_failure 'avoid traversing into ignored directories' '
> > +       test_when_finished rm -f output error &&
> > +       test_create_repo avoid-traversing-deep-hierarchy &&
> > +       (
> > +               cd avoid-traversing-deep-hierarchy &&
> > +
> > +               >directory-random-file.txt &&
> > +               # Put this file under directory400/directory399/.../directory1/
> > +               depth=400 &&
> > +               for x in $(test_seq 1 $depth); do
> > +                       mkdir "tmpdirectory$x" &&
> > +                       mv directory* "tmpdirectory$x" &&
> > +                       mv "tmpdirectory$x" "directory$x"
> > +               done &&
>
> Is this expensive/slow loop needed because you'd otherwise run afoul
> of command-line length limits on some platforms if you tried creating
> the entire mess of directories with a single `mkdir -p`?

The whole point is creating a path long enough that it runs afoul of
limits, yes.

If we had an alternative way to check whether dir.c actually recursed
into a directory, then I could dispense with this and just have a
single directory (and it could be named a single character long for
that matter too), but I don't know of a good way to do that.  (Some
possiibilities I considered along that route are mentioned at
https://lore.kernel.org/git/CABPp-BF3e+MWQAGb6ER7d5jqjcV=kYqQ2stM_oDyaqvonPPPSw@xxxxxxxxxxxxxx/)

> > +               git clean -ffdxn -e directory$depth >../output 2>../error &&
> > +
> > +               test_must_be_empty ../output &&
> > +               # We especially do not want things like
> > +               #   "warning: could not open directory "
> > +               # appearing in the error output.  It is true that directories
> > +               # that are too long cannot be opened, but we should not be
> > +               # recursing into those directories anyway since the very first
> > +               # level is ignored.
> > +               test_must_be_empty ../error &&
> > +
> > +               # alpine-linux-musl fails to "rm -rf" a directory with such
> > +               # a deeply nested hierarchy.  Help it out by deleting the
> > +               # leading directories ourselves.  Super slow, but, what else
> > +               # can we do?  Without this, we will hit a
> > +               #     error: Tests passed but test cleanup failed; aborting
> > +               # so do this ugly manual cleanup...
> > +               while test ! -f directory-random-file.txt; do
> > +                       name=$(ls -d directory*) &&
> > +                       mv $name/* . &&
> > +                       rmdir $name
> > +               done
>
> Shouldn't this cleanup loop be under the control of
> test_when_finished() to ensure it is invoked regardless of how the
> test exits?

I thought about that, but if the test fails, it seems nicer to leave
everything behind so it can be inspected.  It's similar to test_done,
which will only delete the $TRASH_DIRECTORY if all the tests passed.
So no, I don't think this should be under the control of
test_when_finished.