Re: [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory

Elijah Newren <newren@xxxxxxxxx> · Fri, 7 May 2021 17:04:24 -0700

On Fri, May 7, 2021 at 4:05 PM Jeff King <peff@xxxxxxxx> wrote:
>
> On Thu, May 06, 2021 at 10:00:49PM -0700, Elijah Newren wrote:
>
> > > > +               >directory-random-file.txt &&
> > > > +               # Put this file under directory400/directory399/.../directory1/
> > > > +               depth=400 &&
> > > > +               for x in $(test_seq 1 $depth); do
> > > > +                       mkdir "tmpdirectory$x" &&
> > > > +                       mv directory* "tmpdirectory$x" &&
> > > > +                       mv "tmpdirectory$x" "directory$x"
> > > > +               done &&
> > >
> > > Is this expensive/slow loop needed because you'd otherwise run afoul
> > > of command-line length limits on some platforms if you tried creating
> > > the entire mess of directories with a single `mkdir -p`?
> >
> > The whole point is creating a path long enough that it runs afoul of
> > limits, yes.
> >
> > If we had an alternative way to check whether dir.c actually recursed
> > into a directory, then I could dispense with this and just have a
> > single directory (and it could be named a single character long for
> > that matter too), but I don't know of a good way to do that.  (Some
> > possiibilities I considered along that route are mentioned at
> > https://lore.kernel.org/git/CABPp-BF3e+MWQAGb6ER7d5jqjcV=kYqQ2stM_oDyaqvonPPPSw@xxxxxxxxxxxxxx/)
>
> I don't have a better way of checking the dir.c behavior. But I think
> the other half of Eric's question was: why can't we do this setup way
> more efficiently with "mkdir -p"?

I think I figured it out.  I now have the test simplified down to just:

test_expect_success 'avoid traversing into ignored directories' '
    test_when_finished rm -f output error trace.* &&
    test_create_repo avoid-traversing-deep-hierarchy &&
    (
        mkdir -p untracked/subdir/with/a &&
        >untracked/subdir/with/a/random-file.txt &&

        GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
        git clean -ffdxn -e untracked &&

        grep data.*read_directo.*visited ../trace.output \
            | cut -d "|" -f 9 >../trace.relevant &&
        cat >../trace.expect <<-EOF &&
        directories-visited:1
        paths-visited:4
        EOF
        test_cmp ../trace.expect ../trace.relevant
    )
'

This relies on a few extra changes to the code: (1) switching the
existing trace calls in dir.c over to using trace2 variants, and (2)
adding two new counters (visited_directories and visited_paths) that
are output using the trace2 framework.  I'm a little unsure if I
should check the paths-visited counter (will some platform have
additional files in every directory besides '.' and '..'?  Or not have
one of those?), but it is good to have it check that the code in this
case visits no directories other than the toplevel one (i.e. that
directories-visited is 1).

New patches incoming shortly...