chainlint.pl's new "deparse" output (was: [PATCH v2] [...])

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Mon, 24 Oct 2022 11:57:36 +0200

[Please ignore the just-sent empty
https://lore.kernel.org/git/221024.86a65lee8i.gmgdl@xxxxxxxxxxxxxxxxxxx/;
local PBCAK problem :)]

On Tue, Sep 13 2022, Eric Sunshine via GitGitGadget wrote:

> From: Eric Sunshine <sunshine@xxxxxxxxxxxxxx>
>
> When `chainlint.pl` detects problems in a test definition, it emits the
> test definition with "?!FOO?!" annotations highlighting the problems it
> discovered. For instance, given this problematic test:
>
>     test_expect_success 'discombobulate frobnitz' '
>         git frob babble &&
>         (echo balderdash; echo gnabgib) >expect &&
>         for i in three two one
>         do
>             git nitfol $i
>         done >actual
>         test_cmp expect actual
>     '
>
> chainlint.pl will output:
>
>     # chainlint: t1234-confusing.sh
>     # chainlint: discombobulate frobnitz
>     git frob babble &&
>     (echo balderdash ; ?!AMP?! echo gnabgib) >expect &&
>     for i in three two one
>     do
>     git nitfol $i ?!LOOP?!
>     done >actual ?!AMP?!
>     test_cmp expect actual

I've noticed that chainlint.pl is better in some ways, but that the
"deparse" output tends to be quite jarring. but I can't find version of
it that emitted this "will output" here.

Before this patch, or fb41727b7ed (t: retire unused chainlint.sed,
2022-09-01) we'd emit this instead:

	git frob babble &&
	( echo balderdash ; ?!AMP?! echo gnabgib ) > expect &&
	for i in three two one
	do
	git nitfol $i ?!LOOP?!
	done > actual ?!AMP?!
	test_cmp expect actual

The difference is in whitespace, e.g. "( ", not "(", "> " not ">".  This
is just because it's emitting "raw" tokenizer tokens.

Was there maybe some local version where the whitespace munging you're
doing against $checked was different & this commit message was left
over?

Anyway, that sort of an aside, but I did go hunting for the version with slightly better whitespace output.

But to get to the actual point: I've found the new chainlint.pl output
harder to read sometimes, because it goes through this parse & deparse
state, so you're preserving "\n"''s.

Whereas the old "sed" output also sucked because we couldn't note where
the issue was, but we spewed out the test source verbatim.

But it seem to me that we could get much better output if the
ShellParser/Lexer etc. just kept enough state to emit "startcol",
"endcol" and "linenum" along with every token, or something like that
(you might want offsets from the beginning of the parsed source
instead).

Then when it has errors it could emit the actual source passed in, and
even do gcc/clang-style underlining.

I poked at getting that working for a few minutes, but quickly saw that
someone more familiar with the code could do it much quicker, so
consider the above a feature request :)

Another thing: When a test *ends* in a "&&" (common when you copy/paste
e.g. "test_cmp expect actual &&\n" from another test) it doesn't spot
it, but instead we get all the way to the eval/117, i.e. "broken
&&-chain or run-away HERE-DOC".

More feature requests (because for some reason you've got infinite time,
but I don't :): This software is really close to being able to also
change the tests on the fly. If you could define callbacks where you
could change subsets of the parse stream, say a single command like:

	grep some.*rx file

Tokenized as:

	["grep", "some.*rx" "file"]

If you could define an interface to have a callback function e.g. as:

	sub munge_line_tokens {
		my $t = shift;

                return unless $t->[0] eq "grep"; # no changes
                my @t = @$t;

                return [qw(if ! grep), @t[1..$#t],
                	qw(then cat), $t[-1], qw(&& false fi)];
	}

So we could rewrite that into:

        if ! grep some.*rx foo
	then
		cat foo &&
		false
	fi

And other interesting auto-fixups and borderline coccinelle
transformations, e.g. changing our various:

	test "$(git ...) = "" &&

Into:

	git ... >out &&
	test_must_be_empty out