Re: [PATCH] chainlint.pl: recognize test bodies defined via heredoc

Jeff King <peff@xxxxxxxx> · Mon, 8 Jul 2024 05:05:30 -0400

On Sun, Jul 07, 2024 at 11:40:19PM -0400, Eric Sunshine wrote:

> > On the flip side, what about cleaning up? The "heretags" array is
> > emptied as we parse the heredocs in swallow_heredocs(). But I think once
> > a ShellParser's $self->{heredocs}->{FOO} is written, it will hang around
> > forever (even though it's valid only for that one command). Probably not
> > a big deal, but there's probably some correct spot to reset it.
> 
> There are a few reasons I wasn't overly concerned about cleaning up in
> this case:
> 
> (1) The parsers are all short-lived, so the collected heredoc bodies
> won't stick around long anyhow. For each test checked, a TestParser is
> created and destroyed. For each script mentioned on the command-line,
> a ScriptParser is created and destroyed. None of these parsers stick
> around for long, though, a ScriptParser outlives a TestParser.
> 
> (2) The heredoc bodies in question tend to be pretty small, so it's
> not consuming an inordinate amount of memory even if a single parser
> latches bodies of multiple heredocs.
> 
> (3) We tend to be quite consistent about naming our heredoc tag (i.e.
> "EOF", "EOT"), so a latched body in the parser's %heredocs hash is
> very likely to get overwritten, thus the hash is probably not going to
> eat up a lot of memory. Given the entire test suite, I'd be quite
> surprised if any one parser ever latches more than three heredoc
> bodies at a time, and the vast majority of parsers are likely latching
> zero or one heredoc body.
> 
> (4) I couldn't really think of a correct spot to reset %heredocs.

All of that makes sense to me, especially (4). :)

> That said, after reading your message, I did try implementing an
> approach in which the heredoc body gets attached to the `<<` or `<<-`
> token. That way, a heredoc body would be cleaned along with its
> associated lexer token. However, the implementation got too ugly and
> increased cognitive load too much for my liking, so I abandoned it.

OK, thanks for trying. I do think sticking it into the token stream
would make sense, but if the implementation got tricky, it is probably
not worth the effort. We can always revisit it later if we find some
reason that it would be useful to do it that way.

> > OK, mostly as expected. I think the check for "-" here is redundant with
> > what's in check_test(). We could just feed the heredoc body either way,
> > and in the nonsense case of:
> >
> >   test_expect_success 'title' 'test body' <<EOT
> >   nobody reads this!
> >   EOT
> >
> > the heredoc data would just be ignored.
> 
> Right. I went back and forth with this, never sure if this code was
> overkill. On the other hand, we could make this more paranoid and
> complain if we see either of these cases:
> 
> (1) "-" but no heredoc
> (2) heredoc present but something other than "-"

Those seem like good things to check for, and not too hard to add. I'll
see if I can work up some tests.

> > Requiring "<<" at the end is somewhat limiting. E.g. this is valid:
> >
> >   test_expect_success <<EOT 'title' -
> >   the test body
> >   EOT
> 
> True, I didn't even think about that.
> 
> > I don't expect anybody to do that, but it would be nice to be more
> > robust if we can. I think the tokens are still wrapped at this point, so
> > we could read through all of them looking for "<<" anywhere, without
> > getting confused by "$(cat <<INNER_HEREDOC)". I think, anyway (I didn't
> > test).
> 
> Correct. The stuff inside "$(...)" does get parsed and linted, but by
> the time ScriptParser::parse_cmd() sees it, `$(cat <<INNER_HEREDOC)`
> is just a single (string) token.

OK, I'll see if I can generalize it a bit (and add a test).

-Peff