Re: [PATCH v2 12/27] userdiff tests: rewrite hunk header test infrastructure

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Mon, 15 Feb 2021 21:06:49 +0100

On Mon, Feb 15 2021, Johannes Sixt wrote:

> Am 15.02.21 um 16:44 schrieb Ævar Arnfjörð Bjarmason:
>> Rewrite the hunk header test infrastructure introduced in
>> bfa7d01413 (t4018: an infrastructure to test hunk headers,
>> 2014-03-21). See c228a5c077 (Merge branch 'js/userdiff-cc',
>> 2014-03-31) for the whole series that commit was part of.
>> As noted in an earlier commit that change introduced the regression
>> of
>> not testing for the full hunk line, but just whether "RIGHT" appeared
>> on it[1]. A preceding commit fixed that specific issue, but we were
>> still left with the inflexibility of the approach described in the
>> now-deleted t/t4018/README.
>> I.e. to add any sort of new tests that used the existing test data
>> we'd either need to add more files like the recently added (but now
>> deleted) *.ctx) files, using the filesystem as our test datastructure,
>> or introduce more parsing for the custom file format we were growing
>> here.
>> Let's instead just move this over to using a custom test
>> function. This makes it trivial to add new tests by adding new
>> optional parameters to the function. Let's still keep the relevant
>> files in the "t/t4018/" subdirectory instead of adding ~1.5k
>> lines (and growing) to "t/t4018-diff-funcname.sh"
>> If this diff is viewed with "--color-moved=plain" we can see that
>> there's no changes to the lines being moved into the new *.sh files,
>> i.e. all the deletions are moves. I'm just adding boilerplate around
>> those existing lines.
>> The one-off refactoring was performed by an ad-hoc shellscript [2].
>> 1. https://lore.kernel.org/git/87wnvbbf2y.fsf@xxxxxxxxxxxxxxxxxxx/
>> 2.
>> 	#!/bin/sh
>> 	set -ex
>> 	git rm README*
>> 	for t in $(git ls-files ':!*.ctx')
>> 	do
>> 		lang=$(echo $t | sed 's/-.*//')
>> 		desc=$(echo $t | sed -E 's/^[^-]*-//' | tr - " ")
>> 		if ! test -e $lang.sh
>> 		then
>> 			cat >$lang.sh <<-EOF
>> 			#!/bin/sh
>> 			#
>> 			# See ../t4018-diff-funcname.sh's test_diff_funcname()
>> 			#
>> 			EOF
>> 		else
>> 			echo >>$lang.sh
>> 	        fi
>> 		(
>> 	            printf "test_diff_funcname '%s: %s' \\" "$lang" "$desc"
>> 	            echo
>> 	            printf "\t8<<%sEOF_HUNK 9<<%sEOF_TEST\n" '\' '\'
>> 	            cat $t.ctx
>> 	            printf "EOF_HUNK\n"
>> 	            cat $t
>> 	            printf "EOF_TEST\n"
>> 		) >>$lang.sh
>> 		chmod +x $lang.sh
>> 		git add $lang.sh
>> 	        git rm $t $t.ctx
>> 	done
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
>> ---
>
>> diff --git a/t/t4018/bash.sh b/t/t4018/bash.sh
>> new file mode 100755
>> index 0000000000..69144d9144
>> --- /dev/null
>> +++ b/t/t4018/bash.sh
>> @@ -0,0 +1,160 @@
>> +#!/bin/sh
>> +#
>> +# See ../t4018-diff-funcname.sh's test_diff_funcname()
>> +#
>> +
>> +test_diff_funcname 'bash: arithmetic function' \
>> +	8<<\EOF_HUNK 9<<\EOF_TEST
>> +RIGHT()
>> +EOF_HUNK
>> +RIGHT() ((
>> +
>> +    ChangeMe = "$x" + "$y"
>> +))
>> +EOF_TEST
>> +
>> +test_diff_funcname 'bash: bashism style compact' \
>> +	8<<\EOF_HUNK 9<<\EOF_TEST
>> +function RIGHT {
>> +EOF_HUNK
>> +function RIGHT {
>> +    function InvalidSyntax{
>> +        :
>> +        echo 'ChangeMe'
>> +    }
>> +}
>> +EOF_TEST
>> +
>> +test_diff_funcname 'bash: bashism style function' \
>> +	8<<\EOF_HUNK 9<<\EOF_TEST
>> +function RIGHT {
>> +EOF_HUNK
>> +function RIGHT {
>> +    :
>> +    echo 'ChangeMe'
>> +}
>> +EOF_TEST
>> [...]
>
> That is not my dream of "simple". But I'm not a userdiff author
> anymore, so...
>
> I don't know, yet, where this is heading to what the advantage is. At
> any rate,[...]

I originally started writing this because I noticed I could break the
userdiff.c patterns and still have all tests pass, i.e. if you screw up
the capture grouping you can go from:

    @@ -2,3 +2,3 @@ function        RIGHT   (       )       {
    @@ -2,3 +2,3 @@ RIGHT   (       )

to:

    @@ -2,3 +2,3 @@ function        RIGHT   (       )       {
    @@ -2,3 +2,3 @@          RIGHT  (       )

And we wouldn't care because we just "grep 'RIGHT'". In this case we
really care about the difference between "^[ \t]*(.*)$" and a broken
"^([ \t]*.*)$" so not having the tests structurally hide the difference
makes sense.

> [...] "trivial to add new tests" was also the case when each test case
> was in its own file[...]

"trivial to add new tests by adding new optional parameters to the
function". I.e. aside from the s/grep/test_cmp/ change in 09/27 the
existing tests were OK if you wanted to test exactly what they expected,
and no more.

I think it just makes sense to have a test helper function instead and
little bit of boilerplate, as seen e.g. in 14/27 and later in the series
we can add new test modes and set per-test config without needing the
top-level dispatch loop to be aware of it.

> [...] Without the boilerplate!

I realize that's a matter of taste, i.e. when to come up with some
custom format v.s. writng a function.

FWIW as someone who didn't author the format I've come across it N times
over the years and each time ended up being more confused than when
reading any custom test function we have.

For those you can usually just look at the definition/arguments, whereas
this always required a careful read of t4018-diff-funcname.sh.

I also find it easier to have one ~160 line file in my editor than ~150
lines spread over 15 files, as in the recent addition of bash support in
2ff6c34612 (userdiff: support Bash, 2020-10-22).

It also depends on how you're counting boilerplate, if you're looking at
it as a patch on the ML it would be ~160 lines of bash.sh, v.s. ~150
lines of the same content, if we're counting the boilerplate diff of 6
lines for every new file :)