Re: [PATCH 1/2] test-lib: allow test snippets as here-docs

Eric Sunshine <sunshine@xxxxxxxxxxxxxx> · Sat, 6 Jul 2024 02:11:13 -0400

On Sat, Jul 6, 2024 at 1:31 AM Jeff King <peff@xxxxxxxx> wrote:
> On Tue, Jul 02, 2024 at 05:25:48PM -0400, Eric Sunshine wrote:
> > My implementation, however, takes a more formal and paranoid stance.
> > Rather than squirreling away only the most-recently-seen heredoc body,
> > it stores each heredoc body along with the tag which introduced it.
> > This makes it robust against cases when multiple heredocs are
> > initiated on the same line (even within different parse contexts):
> >
> >     cat <<EOFA && x=$(cat <<EOFB &&
> >     A body
> >     EOFA
> >     B body
> >     EOFB
> >
> > Of course, that's not likely to come up in the context of
> > test_expect_* calls, but I prefer the added robustness over the more
> > lax approach.
>
> Yes, that's so much better than what I wrote. I didn't engage my brain
> very much when I read the in-code comments about multiple tags on the
> same line, and I thought you meant:
>
>   cat <<FOO <<BAR
>   this is foo
>   FOO
>   this is bar
>   BAR
>
> which is...weird. It does "work" in the sense that "FOO" is a here-doc
> that should be skipped past. But it is not doing anything useful; cat
> sees only "this is bar" on stdin. So even for this case, the appending
> behavior that my patch does would not make sense.
>
> And of course for the actual useful thing, which you wrote above,
> appending is just nonsense. Recording and accessing by tag is the right
> thing.

In retrospect, I think my claim is bogus in the context of
ScriptParser::parse_cmd(). Specifically, ScriptParser::parse_cmd()
calls its parent ShellParser::parse_cmd() to latch one command.
ShellParser::parse_cmd() stops parsing as soon as it encounters a
command terminator (i.e. `;`, `&&`, `||`, `|`, '&', '\n') and returns
the command. Moreover, by definition, given the language
specification, the lexer only consumes the heredocs upon encountering
`\n`. Thus, if someone writes:

    test_expect_success title - <<\EOT && whatever &&
    ...test body...
    EOT

then ScriptParser::parse_cmd() will receive the command
`test_expect_success title -` from ShellParser::parse_cmd() but the
heredoc will not yet have been consumed by the lexer since it hasn't
yet encountered the newline[1].

So, the above example simply can't work correctly given the way
ScriptParser::parse_cmd() calls ScriptParser::check_test() as soon as
it encounters a `test_expect_success/failure` invocation since it
doesn't know if the heredocs have been latched at that point. To make
it properly robust, rather than immediately calling check_test(), it
would have to continue consuming commands, and saving the ones which
match `test_expect_success/failure` invocation, until it finally hits
a `\n`, and only then call check_test() with each command it saved.
But that's probably overkill at this point considering that we never
write code like the above, so the submitted patch[2] is probably good
enough for now.

FOOTNOTES

[1] One might rightly ask that if ShellParser::parse_cmd() returns
immediately upon seeing a command terminator (i.e. `;`, `&&`, etc.),
then how is it that even a simple:

    test_expect_success title - <<\EOT &&
    ...test body...
    EOT

can work correctly since the `\n` comes after the `&&`. The answer is
that, as a special case, the very last thing ShellParser::parse_cmd()
does is peek ahead to see if a `\n` follows the command terminator
(assuming the terminator is not itself a `\n`). When the next token is
indeed a `\n`, that peek operation causes the lexer to consume the
heredocs.

[2]: https://lore.kernel.org/git/20240702235034.88219-1-ericsunshine@xxxxxxxxxxx/