Re: [PATCH v2 2/9] t5526: use grep to assert on fetches

Glen Choo <chooglen@xxxxxxxxxx> · Wed, 16 Feb 2022 11:09:15 +0800

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes:

> On Wed, Feb 16 2022, Glen Choo wrote:
>
>> In a previous commit, we replaced test_cmp invocations with
>> verify_fetch_result(). Finish the process of removing test_cmp by using
>> grep in verify_fetch_result() instead.
>>
>> This makes the tests less sensitive to changes because, instead of
>> checking the whole stderr, we only grep for the lines of the form
>>
>> * "<old-head>..<new-head>\s+branch\s+-> origin/branch"
>> * "Fetching submodule <submodule-path>" (if fetching a submodule)
>>
>> when we expect the repo to have fetched. If we expect the repo to not
>> have fetched, grep to make sure the lines are absent. Also, simplify the
>> assertions by using grep patterns that match only the relevant pieces of
>> information, e.g. <old-head> is irrelevant because we only want to know
>> if the fetch was performed, so we don't need to know where the branch
>> was before the fetch.
>
> I tried ejecting 1/9 and 2/9 out of this series locally, and it passes
> all tests until the new tests you add in 7/9.
>
> As ugly as some of the pre-image is, I wonder if dropping these first
> two and biting the bullet and just continuing with the test_cmp is
> better.
>
> The test_cmp is going to catch issues that even the cleverest grep
> combinations won't, e.g. in the submodule-in-C series I discovered a bug
> where all of our testing & the existing series hadn't spotted that we
> were dropping a \n at the end in one of the messages.

I think there are two schools of thought on how to test informational
messages:

- assert an exact match on the exact output that we generate
- assert that the output contains the pieces of information we care
  about

These two approaches are virtually opposites on two axes - the former
will catch unintentional changes (like you've noted) and the latter
saves on maintenance effort and tends to be less noisy in tests.

Personally, I'm a bit torn between both approaches in general because I
want tests to be maintainable (testing the exact output is a bit of an
antipattern at Google), but I'm not very comfortable with the fact that
unintended changes can sneak through.

So I don't think there's a correct answer in general. Maybe an
acceptable rule of thumb is that test_cmp is good until it starts
getting in the way of reading and writing understandable tests.

If we agree on that rule, then for this patch, I think replacing
test_cmp is the way to go, primarily because it lets us ignore the 'old
head' of the branch before the fetch, e.g. in the quoted example..

>>  test_expect_success setup '
>> @@ -274,13 +277,10 @@ test_expect_success "Recursion doesn't happen when no new commits are fetched in
>>  '
>>  
>>  test_expect_success "Recursion stops when no new submodule commits are fetched" '
>> -	head1=$(git rev-parse --short HEAD) &&
>>  	git add submodule &&
>>  	git commit -m "new submodule" &&
>> -	head2=$(git rev-parse --short HEAD) &&
>> -	echo "From $pwd/." > expect.err.super &&
>> -	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
>
> ...as opposed to if we just rolled the generation of this into some
> utility printf function.

we'd still have to deal with $head1 if we use test_cmp. That's fine for
this test, because it's pretty simple, but it gets pretty janky later
on:

  @@ -345,20 +339,13 @@ test_expect_success "Recursion picks up all submodules when necessary" '
        git fetch &&
        git checkout -q FETCH_HEAD
      ) &&
  -		head1=$(git rev-parse --short HEAD^) &&
      git add subdir/deepsubmodule &&
      git commit -m "new deepsubmodule" &&
  -		head2=$(git rev-parse --short HEAD) &&
  -		echo "Fetching submodule submodule" > ../expect.err.sub &&
  -		echo "From $pwd/submodule" >> ../expect.err.sub &&
  -		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
  +		git rev-parse --short HEAD >../subhead
    ) &&
  -	head1=$(git rev-parse --short HEAD) &&
    git add submodule &&
    git commit -m "new submodule" &&
  -	head2=$(git rev-parse --short HEAD) &&
  -	echo "From $pwd/." > expect.err.super &&
  -	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
  +	git rev-parse --short HEAD >superhead &&
    (
      cd downstream &&
      git fetch >../actual.out 2>../actual.err

In this example, we have two $head1 variables in different subshells,
one of which is HEAD, but the other is HEAD^. The reason why we want
HEAD^ isn't obvious (IIRC it's because the submodule upstream is 2
commits ahead because we add the deepsubmodule in a separate commit), in
my opinion, and I got tripped up quite a few times trying to read and
understand the test. That's a lot of effort to spend on irrelevant
information - the test actually cares about what it fetched, not where
the ref used to be.

So for that reason, I'd prefer to remove test_cmp for this test.