On Sat, Jul 6, 2024 at 2:01 AM Jeff King <peff@xxxxxxxx> wrote: > On Tue, Jul 02, 2024 at 07:50:34PM -0400, Eric Sunshine wrote: > I'll post some patches in a moment: > > [1/3]: chainlint.pl: fix line number reporting > [2/3]: t/chainlint: add test_expect_success call to test snippets > [3/3]: t/chainlint: add tests for test body in heredoc > > with the idea that we'd apply your patch here on top of what Junio has > queued in jk/test-body-in-here-doc, and then these three on top. For > Junio's sanity, I'll roll it all up into one series. But I wanted to > show it to you incrementally first, especially because I think the fixes > from patch 1/3 above should probably just get squashed in (or even > rewritten). I'll discuss the bugs they fix below. Considering the excellent explanation you crafted in your patch, I'd like to say that it should remain separate from mine. However, since you caught the problems in review, it would be irresponsible of us to let my patch into the permanent history as-is. So, feel free to squash your fixes into my patch. Perhaps add a Co-authored-by:? The bit from your [1/3] commit message about incrementing $lineno for the heredoc-body case might be worth squashing in too? I wrote one minor (perhaps non-actionable) comment in response to patch [3/3]. The patches all looked fine to me, so: Acked-by: Eric Sunshine <sunshine@xxxxxxxxxxxxxx> > > @@ -232,7 +234,8 @@ sub new { > > my $self = bless { > > buff => [], > > stop => [], > > - output => [] > > + output => [], > > + heredocs => {}, > > } => $class; > > $self->{lexer} = Lexer->new($self, $s); > > return $self; > > I think initializing is not strictly necessary here, since we'd only try > to read tags if we saw a here-doc. But there might be some invalid cases > where we could convince higher-level code to look for tags even though > there were none (and generate a perl warning about trying to dereference > undef as a hashref). You're right, it's not necessary to initialize here, but it feels more consistent to do so. That said, I don't feel strongly either way. > On the flip side, what about cleaning up? The "heretags" array is > emptied as we parse the heredocs in swallow_heredocs(). But I think once > a ShellParser's $self->{heredocs}->{FOO} is written, it will hang around > forever (even though it's valid only for that one command). Probably not > a big deal, but there's probably some correct spot to reset it. There are a few reasons I wasn't overly concerned about cleaning up in this case: (1) The parsers are all short-lived, so the collected heredoc bodies won't stick around long anyhow. For each test checked, a TestParser is created and destroyed. For each script mentioned on the command-line, a ScriptParser is created and destroyed. None of these parsers stick around for long, though, a ScriptParser outlives a TestParser. (2) The heredoc bodies in question tend to be pretty small, so it's not consuming an inordinate amount of memory even if a single parser latches bodies of multiple heredocs. (3) We tend to be quite consistent about naming our heredoc tag (i.e. "EOF", "EOT"), so a latched body in the parser's %heredocs hash is very likely to get overwritten, thus the hash is probably not going to eat up a lot of memory. Given the entire test suite, I'd be quite surprised if any one parser ever latches more than three heredoc bodies at a time, and the vast majority of parsers are likely latching zero or one heredoc body. (4) I couldn't really think of a correct spot to reset %heredocs. That said, after reading your message, I did try implementing an approach in which the heredoc body gets attached to the `<<` or `<<-` token. That way, a heredoc body would be cleaned along with its associated lexer token. However, the implementation got too ugly and increased cognitive load too much for my liking, so I abandoned it. > > sub check_test { > > my $self = shift @_; > > - my ($title, $body) = map(unwrap, @_); > > + my $title = unwrap(shift @_); > > + my $body = unwrap(shift @_); > > + $body = shift @_ if $body eq '-'; > > $self->{ntests}++; > > my $parser = TestParser->new(\$body); > > my @tokens = $parser->parse(); > > This has two problems related to line numbers. You can't see it in the > context, but we later do: > > my $lineno = $_[1]->[3]; > > Now that we're shifting @_, that array item is gone. Ugh, this is embarrassing. I did run chainlint.pl on t0600 in which I had intentionally broken some &&-chains, so I saw the output, but somehow I overlooked that it broke the line numbering entirely. > The second is that the line number for the here-doc is actually one past > the initial line number of the test_expect_success. That works > automatically for hanging single-quotes, since the newline from that > line is inside the quoted area. But for a here-doc, we have to account > for it manually. In my original patch I prepended "\n", but you can also > just increment $lineno (which is what I did in the fix I'm about to > send). Nicely spotted. Simply incrementing $lineno does feel a bit hacky, but I agree that it is probably good enough for now; it doesn't seem likely that it will break any time soon. But I also agree with the commentary you wrote in patch [1/3] that it probably would be easy enough to latch the line number of the beginning of the heredoc body and employ that value. That would certainly be more robust. > > @@ -649,8 +654,13 @@ sub parse_cmd { > > return @tokens unless @tokens && $tokens[0]->[0] =~ /^test_expect_(?:success|failure)$/; > > my $n = $#tokens; > > $n-- while $n >= 0 && $tokens[$n]->[0] =~ /^(?:[;&\n|]|&&|\|\|)$/; > > - $self->check_test($tokens[1], $tokens[2]) if $n == 2; # title body > > - $self->check_test($tokens[2], $tokens[3]) if $n > 2; # prereq title body > > + my $herebody; > > + if ($n >= 2 && $tokens[$n-1]->[0] eq '-' && $tokens[$n]->[0] =~ /^<<-?(.+)$/) { > > + $herebody = $self->{heredocs}->{$1}; > > + $n--; > > + } > > + $self->check_test($tokens[1], $tokens[2], $herebody) if $n == 2; # title body > > + $self->check_test($tokens[2], $tokens[3], $herebody) if $n > 2; # prereq title body > > return @tokens; > > } > > OK, mostly as expected. I think the check for "-" here is redundant with > what's in check_test(). We could just feed the heredoc body either way, > and in the nonsense case of: > > test_expect_success 'title' 'test body' <<EOT > nobody reads this! > EOT > > the heredoc data would just be ignored. Right. I went back and forth with this, never sure if this code was overkill. On the other hand, we could make this more paranoid and complain if we see either of these cases: (1) "-" but no heredoc (2) heredoc present but something other than "-" > Requiring "<<" at the end is somewhat limiting. E.g. this is valid: > > test_expect_success <<EOT 'title' - > the test body > EOT True, I didn't even think about that. > I don't expect anybody to do that, but it would be nice to be more > robust if we can. I think the tokens are still wrapped at this point, so > we could read through all of them looking for "<<" anywhere, without > getting confused by "$(cat <<INNER_HEREDOC)". I think, anyway (I didn't > test). Correct. The stuff inside "$(...)" does get parsed and linted, but by the time ScriptParser::parse_cmd() sees it, `$(cat <<INNER_HEREDOC)` is just a single (string) token.