Re: [PATCH] chainlint.pl: recognize test bodies defined via heredoc

Eric Sunshine <sunshine@xxxxxxxxxxxxxx> · Mon, 8 Jul 2024 16:06:47 -0400

On Mon, Jul 8, 2024 at 5:05 AM Jeff King <peff@xxxxxxxx> wrote:
> On Sun, Jul 07, 2024 at 11:40:19PM -0400, Eric Sunshine wrote:
> > (3) We tend to be quite consistent about naming our heredoc tag (i.e.
> > "EOF", "EOT"), so a latched body in the parser's %heredocs hash is
> > very likely to get overwritten, thus the hash is probably not going to
> > eat up a lot of memory. Given the entire test suite, I'd be quite
> > surprised if any one parser ever latches more than three heredoc
> > bodies at a time, and the vast majority of parsers are likely latching
> > zero or one heredoc body.

One thing we may want to measure is how much extra time we're wasting
for the (very) common case of latching heredoc bodies only to then
ignore them. In particular, we may want to add a flag to ShellParser
telling it whether or not to latch heredoc bodies, and enable that
flag in subclass ScriptParser, but leave it disabled in subclass
TestParser since only ScriptParser currently cares about the heredoc
body.

> > (4) I couldn't really think of a correct spot to reset %heredocs.
>
> All of that makes sense to me, especially (4). :)
>
> > That said, after reading your message, I did try implementing an
> > approach in which the heredoc body gets attached to the `<<` or `<<-`
> > token. That way, a heredoc body would be cleaned along with its
> > associated lexer token. However, the implementation got too ugly and
> > increased cognitive load too much for my liking, so I abandoned it.
>
> OK, thanks for trying. I do think sticking it into the token stream
> would make sense, but if the implementation got tricky, it is probably
> not worth the effort. We can always revisit it later if we find some
> reason that it would be useful to do it that way.

In the long run, I think we probably want to build a full parse tree,
attach relevant information (such as a heredoc body) to each node, and
then walk the tree, rather than trying to perform on-the-fly lints and
other operations on the token stream as is currently the case.

This encapsulation would not only solve the problem of releasing
related resources (such as releasing the heredoc body when the `<<` or
`<<-` node is released), but it would also make it possible to perform
other lints I've had in mind. For instance, a while ago, I added (but
did not submit) a lint to check for `cd` outside of a subshell. After
implementing that, I realized that the cd-outside-subshell lint would
be useful, not just within test bodies, but also at the script level
itself. However, because actual linting functionality resides entirely
in TestParser, I wasn't able to reuse the code for detecting
cd-outside-subshell at the script level, and ended up having to write
duplicate linting code in ScriptParser. If, on the other hand, the
linting code was just handed a parse tree, then it wouldn't matter if
that parse tree came from parsing a test body or parsing a script.

All (or most) of the checks in t/check-non-portable-shell.pl could
also be incorporated into chainlint.pl (though that makes the name
"chainlint.pl" even more of an anachronism than it already is since it
outgrew "chain linting" when it starting checking for missing `||
return`, if not before then.)