Re: [PATCH] chainlint.pl: recognize test bodies defined via heredoc

Jeff King <peff@xxxxxxxx> · Sat, 6 Jul 2024 02:01:43 -0400

On Tue, Jul 02, 2024 at 07:50:34PM -0400, Eric Sunshine wrote:

> This is a clean-room implementation which serves the same purpose as a
> change proposed[1] by Peff; it was created before I looked at Peff's
> proposal. The two independent implementations turned out quite similar,
> but the one implemented by this patch takes a more formal and paranoid
> stance. In particular, unlike Peff's patch, it doesn't trust that the
> most-recently-seen heredoc body is one associated with the
> `test_expect_success` invocation.

Thanks for working on this! I think this is better than the patch I
showed earlier. But I am still glad to have worked on that one, because
there is no way I'd be able to intelligently review that one without
having poked at the code so much myself.

> This patch can sit either at the top or bottom of Peff's series[2].
> 
> There was also related discussion of improving the chainlint self-test
> infrastructure[3], however, such proposed changes needn't hold up Peff's
> series[2]; such improvements can be applied after the dust settles. On
> the other hand, Peff, if you plan to reroll for some reason, feel free
> to incorporate this patch into your series.

IMHO we want it all to come together. We should not allow "<<\EOT"
without making sure we can chainlint the test bodies, and we should not
make such a big change to chainlint.pl without tests to make sure it
works.

I'll post some patches in a moment:

  [1/3]: chainlint.pl: fix line number reporting
  [2/3]: t/chainlint: add test_expect_success call to test snippets
  [3/3]: t/chainlint: add tests for test body in heredoc

with the idea that we'd apply your patch here on top of what Junio has
queued in jk/test-body-in-here-doc, and then these three on top. For
Junio's sanity, I'll roll it all up into one series. But I wanted to
show it to you incrementally first, especially because I think the fixes
from patch 1/3 above should probably just get squashed in (or even
rewritten). I'll discuss the bugs they fix below.

> diff --git a/t/chainlint.pl b/t/chainlint.pl
> index 1bbd985b78..eba509b8e1 100755
> --- a/t/chainlint.pl
> +++ b/t/chainlint.pl
> @@ -174,6 +174,8 @@ sub swallow_heredocs {
>  		$$b =~ /(?:\G|\n)$indent\Q$$tag[0]\E(?:\n|\z)/gc;
>  		if (pos($$b) > $start) {
>  			my $body = substr($$b, $start, pos($$b) - $start);
> +			$self->{parser}->{heredocs}->{$$tag[0]} =
> +			    substr($body, 0, length($body) - length($&));
>  			$self->{lineno} += () = $body =~ /\n/sg;
>  			next;
>  		}

OK, this part looks familiar. :)

> @@ -232,7 +234,8 @@ sub new {
>  	my $self = bless {
>  		buff => [],
>  		stop => [],
> -		output => []
> +		output => [],
> +		heredocs => {},
>  	} => $class;
>  	$self->{lexer} = Lexer->new($self, $s);
>  	return $self;

I think initializing is not strictly necessary here, since we'd only try
to read tags if we saw a here-doc. But there might be some invalid cases
where we could convince higher-level code to look for tags even though
there were none (and generate a perl warning about trying to dereference
undef as a hashref).

On the flip side, what about cleaning up? The "heretags" array is
emptied as we parse the heredocs in swallow_heredocs(). But I think once
a ShellParser's $self->{heredocs}->{FOO} is written, it will hang around
forever (even though it's valid only for that one command). Probably not
a big deal, but there's probably some correct spot to reset it.

> @@ -616,7 +619,9 @@ sub unwrap {
>  
>  sub check_test {
>  	my $self = shift @_;
> -	my ($title, $body) = map(unwrap, @_);
> +	my $title = unwrap(shift @_);
> +	my $body = unwrap(shift @_);
> +	$body = shift @_ if $body eq '-';
>  	$self->{ntests}++;
>  	my $parser = TestParser->new(\$body);
>  	my @tokens = $parser->parse();

This has two problems related to line numbers. You can't see it in the
context, but we later do:

  my $lineno = $_[1]->[3];

Now that we're shifting @_, that array item is gone.

The second is that the line number for the here-doc is actually one past
the initial line number of the test_expect_success. That works
automatically for hanging single-quotes, since the newline from that
line is inside the quoted area. But for a here-doc, we have to account
for it manually. In my original patch I prepended "\n", but you can also
just increment $lineno (which is what I did in the fix I'm about to
send).

> @@ -649,8 +654,13 @@ sub parse_cmd {
>  	return @tokens unless @tokens && $tokens[0]->[0] =~ /^test_expect_(?:success|failure)$/;
>  	my $n = $#tokens;
>  	$n-- while $n >= 0 && $tokens[$n]->[0] =~ /^(?:[;&\n|]|&&|\|\|)$/;
> -	$self->check_test($tokens[1], $tokens[2]) if $n == 2; # title body
> -	$self->check_test($tokens[2], $tokens[3]) if $n > 2;  # prereq title body
> +	my $herebody;
> +	if ($n >= 2 && $tokens[$n-1]->[0] eq '-' && $tokens[$n]->[0] =~ /^<<-?(.+)$/) {
> +		$herebody = $self->{heredocs}->{$1};
> +		$n--;
> +	}
> +	$self->check_test($tokens[1], $tokens[2], $herebody) if $n == 2; # title body
> +	$self->check_test($tokens[2], $tokens[3], $herebody) if $n > 2;  # prereq title body
>  	return @tokens;
>  }

OK, mostly as expected. I think the check for "-" here is redundant with
what's in check_test(). We could just feed the heredoc body either way,
and in the nonsense case of:

  test_expect_success 'title' 'test body' <<EOT
  nobody reads this!
  EOT

the heredoc data would just be ignored.

Requiring "<<" at the end is somewhat limiting. E.g. this is valid:

  test_expect_success <<EOT 'title' -
  the test body
  EOT

I don't expect anybody to do that, but it would be nice to be more
robust if we can. I think the tokens are still wrapped at this point, so
we could read through all of them looking for "<<" anywhere, without
getting confused by "$(cat <<INNER_HEREDOC)". I think, anyway (I didn't
test).

I didn't address either of those comments in the patches I'm about to
send.

-Peff