Re: [PATCH 1/2] test-lib: allow test snippets as here-docs

Eric Sunshine <sunshine@xxxxxxxxxxxxxx> · Tue, 2 Jul 2024 17:37:39 -0400

On Mon, Jul 1, 2024 at 9:13 PM Jeff King <peff@xxxxxxxx> wrote:
> On Mon, Jul 01, 2024 at 08:51:45PM -0400, Jeff King wrote:
> > And then ScriptParser::parse_cmd() just has to recognize that any "<<"
> > token isn't interesting, and that "-" means "read the here-doc".
>
> BTW, there's one non-obvious thing here about why this works. You'd
> think that:
>
>   test_expect_success 'foo' <<\EOT
>         cat <<-\EOF
>         this is a here-doc
>         EOF
>         echo ok
>   EOT
>
> wouldn't work, because the lexer only has a single here-doc store, and
> the inner one is going to overwrite the outer. But we don't lex the
> inner contents of the test snippet until we've processed the
> test_expect_success line, at which point we've copied it out.
>
> So I dunno. It feels a bit hacky, but I think it's how you have to do it
> anyway.

It wasn't non-obvious to me, but I suppose it's because I know the
author, or I am the author, or something.

> > -     $n-- while $n >= 0 && $tokens[$n]->[0] =~ /^(?:[;&\n|]|&&|\|\|)$/;
> > +     $n-- while $n >= 0 && $tokens[$n]->[0] =~ /^(?:[;&\n|]|&&|\|\||<<[A-Za-z]+)$/;
>
> One curiosity I noted is that the backslash of my "<<\EOT" seems to be
> eaten by the lexer (I guess because it doesn't know the special meaning
> of backslash here, and just does the usual "take the next char
> literally").

That's not the reason. It actively strips the backslash because it
knows that it doesn't care about it after this point and, more
importantly, because it needs to extract the raw heredoc tag name
(without the slash or other surrounding quotes) so that it can match
upon that name (say, "EOF") to find the end of the heredoc body.

It's mostly an accident of implementation (and probably a throwback to
chainlint.sed) that it strips the backslash early in
Lexer::scan_heredoc_tag() even though it doesn't actually have to be
stripped until Lexer::swallow_heredocs() needs to match the tag name
to find the end of the heredoc body. Thus, in retrospect, the
implementation could have retained the backslash (`\EOF`) or quotes
(`'EOF'` or `"EOF"`) and left it for swallow_heredocs() to strip them
only when needed.

There's another weird throwback to chainlint.sed in
Lexer::scan_heredoc_tag() where it transforms `<<-` into `<<\t`, which
is potentially more than a little confusing, especially since it is (I
believe) totally unnecessary in the context of chainlint.pl.

> I think that is OK for our purposes here, though we might
> in the long run want to raise a linting error if you accidentally used
> an interpolating here-doc (it's not strictly wrong to do so, but I think
> we generally frown on it as a style thing).

Such a linting warning would probably have to be context-sensitive so
it only triggers for test_expect_* calls.