Re: [PATCH v2 16/19] parallel-checkout: add tests for basic operations

Matheus Tavares Bernardino <matheus.bernardino@xxxxxx> · Tue, 20 Oct 2020 00:18:21 -0300

Hi, Jonathan

On Mon, Oct 19, 2020 at 10:36 PM Jonathan Nieder <jrnieder@xxxxxxxxx> wrote:
>
> Hi,
>
> Matheus Tavares wrote:
>
> > Add tests to populate the working tree during clone and checkout using
> > the sequential and parallel modes, to confirm that they produce
> > identical results. Also test basic checkout mechanics, such as checking
> > for symlinks in the leading directories and the abidance to --force.
>
> Thanks for implementing parallel checkout!  I'm excited about the
> feature.  And thanks for including these tests.

Thanks for the comments and feedback :)

> [...]
> > --- /dev/null
> > +++ b/t/lib-parallel-checkout.sh
> > @@ -0,0 +1,39 @@
> [...]
> > +# Runs `git -c checkout.workers=$1 -c checkout.thesholdForParallelism=$2 ${@:4}`
> > +# and checks that the number of workers spawned is equal to $3.
> > +git_pc()
>
> nit: what does git_pc mean?

The idea was "git w/ parallel-checkout". But I realize it may have
gotten too abbreviated...

> Can this spell it out more verbosely, or
> could callers take on more of the burden?  (Perhaps it would make sense
> to use a helper that uses test_config to set the relevant configuration,
> and then the caller can use plain "git clone"?)

Hmm, it's possible, but I think we might end up with quite a lot of
repetition (to always check that checkout spawned the right number of
workers).

> [...]
> > +     GIT_TRACE2="$(pwd)/trace" git \
> > +             -c checkout.workers=$workers \
> > +             -c checkout.thresholdForParallelism=$threshold \
> > +             -c advice.detachedHead=0 \
> > +             $@ &&
>
> $@ needs to be quoted, or else it will act like $* (and in particular it
> won't handle parameters with embedded spaces).

Nice catch, thanks! I will send a patch for this tomorrow.

> > +
> > +     # Check that the expected number of workers has been used. Note that it
> > +     # can be different than the requested number in two cases: when the
> > +     # quantity of entries to be checked out is less than the number of
> > +     # workers; and when the threshold has not been reached.
> > +     #
> > +     local workers_in_trace=$(grep "child_start\[.\+\] git checkout--helper" trace | wc -l) &&
>
> Do we use grep's \+ operator in other tests?  I thought we preferred to
> use the more portable *, but it may be that I'm out of date.

Oh, I didn't know about the portability issue with \+. This is already
in `next`, but I guess it's worth sending a follow-up patch to fix it,
right? (I see we have a second \+ occurrence in t7508, which could be
changed in the same patch.)

> [...]
> > +# Verify that both the working tree and the index were created correctly
> > +verify_checkout()
> > +{
> > +     git -C $1 diff-index --quiet HEAD -- &&
> > +     git -C $1 diff-index --quiet --cached HEAD -- &&
> > +     git -C $1 status --porcelain >$1.status &&
> > +     test_must_be_empty $1.status
> > +}
>
> Like git_pc, this is not easy to take in at a glance.
>
> "$1" needs to be quoted if we are to handle paths with spaces.

Thanks, again :) Currently, this function doesn't get paths with
spaces, but I agree that it's better to be cautious here.

> [...]
> > --- /dev/null
> > +++ b/t/t2080-parallel-checkout-basics.sh
> > @@ -0,0 +1,197 @@
> > +#!/bin/sh
> > +
> > +test_description='parallel-checkout basics
> > +
> > +Ensure that parallel-checkout basically works on clone and checkout, spawning
> > +the required number of workers and correctly populating both the index and
> > +working tree.
> > +'
> > +
> > +TEST_NO_CREATE_REPO=1
> > +. ./test-lib.sh
> > +. "$TEST_DIRECTORY/lib-parallel-checkout.sh"
> > +
> > +# NEEDSWORK: cloning a SHA1 repo with GIT_TEST_DEFAULT_HASH set to "sha256"
> > +# currently produces a wrong result (See
> > +# https://lore.kernel.org/git/20200911151717.43475-1-matheus.bernardino@xxxxxx/).
> > +# So we skip the "parallel-checkout during clone" tests when this test flag is
> > +# set to "sha256". Remove this when the bug is fixed.
> > +#
> > +if test "$GIT_TEST_DEFAULT_HASH" = "sha256"
> > +then
> > +     skip_all="t2080 currently don't work with GIT_TEST_DEFAULT_HASH=sha256"
> > +     test_done
> > +fi
> > +
> > +R_BASE=$GIT_BUILD_DIR
> > +
> > +test_expect_success 'sequential clone' '
> > +     git_pc 1 0 0 clone --quiet -- $R_BASE r_sequential &&
>
> This fails when I run it when building from a tarball, which is
> presenting me from releasing this patch series to Debian experimental.

Sorry for the trouble :( It didn't occur to me, while writing the
test, that it could also be run from the tarball.

> Can we use an artificial repo instead of git.git?  Using git.git as
> test data seems like a recipe for hard-to-reproduce test failures.

I think we could maybe drop these tests. There are already some
similar tests below these, which use an artificial repository. The
goal of using git.git in this section was to test parallel-checkout
with a real-world repo, and hopefully catch errors that we might not
see with small artificial ones.  But you have a very valid concern, as
well. Hmm, I'm not sure what is the best solution to this case. What
do you think?