On Mon, Jul 1, 2024 at 9:13 PM Jeff King <peff@xxxxxxxx> wrote: > On Mon, Jul 01, 2024 at 08:51:45PM -0400, Jeff King wrote: > > And then ScriptParser::parse_cmd() just has to recognize that any "<<" > > token isn't interesting, and that "-" means "read the here-doc". > > BTW, there's one non-obvious thing here about why this works. You'd > think that: > > test_expect_success 'foo' <<\EOT > cat <<-\EOF > this is a here-doc > EOF > echo ok > EOT > > wouldn't work, because the lexer only has a single here-doc store, and > the inner one is going to overwrite the outer. But we don't lex the > inner contents of the test snippet until we've processed the > test_expect_success line, at which point we've copied it out. > > So I dunno. It feels a bit hacky, but I think it's how you have to do it > anyway. It wasn't non-obvious to me, but I suppose it's because I know the author, or I am the author, or something. > > - $n-- while $n >= 0 && $tokens[$n]->[0] =~ /^(?:[;&\n|]|&&|\|\|)$/; > > + $n-- while $n >= 0 && $tokens[$n]->[0] =~ /^(?:[;&\n|]|&&|\|\||<<[A-Za-z]+)$/; > > One curiosity I noted is that the backslash of my "<<\EOT" seems to be > eaten by the lexer (I guess because it doesn't know the special meaning > of backslash here, and just does the usual "take the next char > literally"). That's not the reason. It actively strips the backslash because it knows that it doesn't care about it after this point and, more importantly, because it needs to extract the raw heredoc tag name (without the slash or other surrounding quotes) so that it can match upon that name (say, "EOF") to find the end of the heredoc body. It's mostly an accident of implementation (and probably a throwback to chainlint.sed) that it strips the backslash early in Lexer::scan_heredoc_tag() even though it doesn't actually have to be stripped until Lexer::swallow_heredocs() needs to match the tag name to find the end of the heredoc body. Thus, in retrospect, the implementation could have retained the backslash (`\EOF`) or quotes (`'EOF'` or `"EOF"`) and left it for swallow_heredocs() to strip them only when needed. There's another weird throwback to chainlint.sed in Lexer::scan_heredoc_tag() where it transforms `<<-` into `<<\t`, which is potentially more than a little confusing, especially since it is (I believe) totally unnecessary in the context of chainlint.pl. > I think that is OK for our purposes here, though we might > in the long run want to raise a linting error if you accidentally used > an interpolating here-doc (it's not strictly wrong to do so, but I think > we generally frown on it as a style thing). Such a linting warning would probably have to be context-sensitive so it only triggers for test_expect_* calls.