On Sat, Jul 6, 2024 at 1:31 AM Jeff King <peff@xxxxxxxx> wrote: > On Tue, Jul 02, 2024 at 05:25:48PM -0400, Eric Sunshine wrote: > > My implementation, however, takes a more formal and paranoid stance. > > Rather than squirreling away only the most-recently-seen heredoc body, > > it stores each heredoc body along with the tag which introduced it. > > This makes it robust against cases when multiple heredocs are > > initiated on the same line (even within different parse contexts): > > > > cat <<EOFA && x=$(cat <<EOFB && > > A body > > EOFA > > B body > > EOFB > > > > Of course, that's not likely to come up in the context of > > test_expect_* calls, but I prefer the added robustness over the more > > lax approach. > > Yes, that's so much better than what I wrote. I didn't engage my brain > very much when I read the in-code comments about multiple tags on the > same line, and I thought you meant: > > cat <<FOO <<BAR > this is foo > FOO > this is bar > BAR > > which is...weird. It does "work" in the sense that "FOO" is a here-doc > that should be skipped past. But it is not doing anything useful; cat > sees only "this is bar" on stdin. So even for this case, the appending > behavior that my patch does would not make sense. > > And of course for the actual useful thing, which you wrote above, > appending is just nonsense. Recording and accessing by tag is the right > thing. In retrospect, I think my claim is bogus in the context of ScriptParser::parse_cmd(). Specifically, ScriptParser::parse_cmd() calls its parent ShellParser::parse_cmd() to latch one command. ShellParser::parse_cmd() stops parsing as soon as it encounters a command terminator (i.e. `;`, `&&`, `||`, `|`, '&', '\n') and returns the command. Moreover, by definition, given the language specification, the lexer only consumes the heredocs upon encountering `\n`. Thus, if someone writes: test_expect_success title - <<\EOT && whatever && ...test body... EOT then ScriptParser::parse_cmd() will receive the command `test_expect_success title -` from ShellParser::parse_cmd() but the heredoc will not yet have been consumed by the lexer since it hasn't yet encountered the newline[1]. So, the above example simply can't work correctly given the way ScriptParser::parse_cmd() calls ScriptParser::check_test() as soon as it encounters a `test_expect_success/failure` invocation since it doesn't know if the heredocs have been latched at that point. To make it properly robust, rather than immediately calling check_test(), it would have to continue consuming commands, and saving the ones which match `test_expect_success/failure` invocation, until it finally hits a `\n`, and only then call check_test() with each command it saved. But that's probably overkill at this point considering that we never write code like the above, so the submitted patch[2] is probably good enough for now. FOOTNOTES [1] One might rightly ask that if ShellParser::parse_cmd() returns immediately upon seeing a command terminator (i.e. `;`, `&&`, etc.), then how is it that even a simple: test_expect_success title - <<\EOT && ...test body... EOT can work correctly since the `\n` comes after the `&&`. The answer is that, as a special case, the very last thing ShellParser::parse_cmd() does is peek ahead to see if a `\n` follows the command terminator (assuming the terminator is not itself a `\n`). When the next token is indeed a `\n`, that peek operation causes the lexer to consume the heredocs. [2]: https://lore.kernel.org/git/20240702235034.88219-1-ericsunshine@xxxxxxxxxxx/