Re: [PATCH 1/2] test-lib: allow test snippets as here-docs

Jeff King <peff@xxxxxxxx> · Sat, 6 Jul 2024 01:44:15 -0400

On Tue, Jul 02, 2024 at 05:37:39PM -0400, Eric Sunshine wrote:

> > BTW, there's one non-obvious thing here about why this works. You'd
> > think that:
> >
> >   test_expect_success 'foo' <<\EOT
> >         cat <<-\EOF
> >         this is a here-doc
> >         EOF
> >         echo ok
> >   EOT
> >
> > wouldn't work, because the lexer only has a single here-doc store, and
> > the inner one is going to overwrite the outer. But we don't lex the
> > inner contents of the test snippet until we've processed the
> > test_expect_success line, at which point we've copied it out.
> >
> > So I dunno. It feels a bit hacky, but I think it's how you have to do it
> > anyway.
> 
> It wasn't non-obvious to me, but I suppose it's because I know the
> author, or I am the author, or something.

:) I had a brief moment of panic where I thought "wait, what I sent out
is going to break in this case!" and then was surprised when it worked.

> > > -     $n-- while $n >= 0 && $tokens[$n]->[0] =~ /^(?:[;&\n|]|&&|\|\|)$/;
> > > +     $n-- while $n >= 0 && $tokens[$n]->[0] =~ /^(?:[;&\n|]|&&|\|\||<<[A-Za-z]+)$/;
> >
> > One curiosity I noted is that the backslash of my "<<\EOT" seems to be
> > eaten by the lexer (I guess because it doesn't know the special meaning
> > of backslash here, and just does the usual "take the next char
> > literally").
> 
> That's not the reason. It actively strips the backslash because it
> knows that it doesn't care about it after this point and, more
> importantly, because it needs to extract the raw heredoc tag name
> (without the slash or other surrounding quotes) so that it can match
> upon that name (say, "EOF") to find the end of the heredoc body.
> 
> It's mostly an accident of implementation (and probably a throwback to
> chainlint.sed) that it strips the backslash early in
> Lexer::scan_heredoc_tag() even though it doesn't actually have to be
> stripped until Lexer::swallow_heredocs() needs to match the tag name
> to find the end of the heredoc body. Thus, in retrospect, the
> implementation could have retained the backslash (`\EOF`) or quotes
> (`'EOF'` or `"EOF"`) and left it for swallow_heredocs() to strip them
> only when needed.

OK. I think it does make things easier to normalize this a bit, so that
ScriptParser::parse_cmd() doesn't have to worry about all of the various
spellings. If we recorded a single bit for "this was quoted" alongside
the heredoc contents, that would be plenty. But as I (erroneously) said
elsewhere, we can worry about that later if we find something useful to
do with it.

> There's another weird throwback to chainlint.sed in
> Lexer::scan_heredoc_tag() where it transforms `<<-` into `<<\t`, which
> is potentially more than a little confusing, especially since it is (I
> believe) totally unnecessary in the context of chainlint.pl.

Ah, I hadn't noticed that. Looks like we use it in swallow_heredocs() to
read the tag data itself. But importantly the token stream still has
the correct original in it, which we need to correctly match in
ScriptParser::parse_cmd().

> > I think that is OK for our purposes here, though we might
> > in the long run want to raise a linting error if you accidentally used
> > an interpolating here-doc (it's not strictly wrong to do so, but I think
> > we generally frown on it as a style thing).
> 
> Such a linting warning would probably have to be context-sensitive so
> it only triggers for test_expect_* calls.

Yes, definitely.

-Peff