On Tue, Jul 02, 2024 at 05:37:39PM -0400, Eric Sunshine wrote: > > BTW, there's one non-obvious thing here about why this works. You'd > > think that: > > > > test_expect_success 'foo' <<\EOT > > cat <<-\EOF > > this is a here-doc > > EOF > > echo ok > > EOT > > > > wouldn't work, because the lexer only has a single here-doc store, and > > the inner one is going to overwrite the outer. But we don't lex the > > inner contents of the test snippet until we've processed the > > test_expect_success line, at which point we've copied it out. > > > > So I dunno. It feels a bit hacky, but I think it's how you have to do it > > anyway. > > It wasn't non-obvious to me, but I suppose it's because I know the > author, or I am the author, or something. :) I had a brief moment of panic where I thought "wait, what I sent out is going to break in this case!" and then was surprised when it worked. > > > - $n-- while $n >= 0 && $tokens[$n]->[0] =~ /^(?:[;&\n|]|&&|\|\|)$/; > > > + $n-- while $n >= 0 && $tokens[$n]->[0] =~ /^(?:[;&\n|]|&&|\|\||<<[A-Za-z]+)$/; > > > > One curiosity I noted is that the backslash of my "<<\EOT" seems to be > > eaten by the lexer (I guess because it doesn't know the special meaning > > of backslash here, and just does the usual "take the next char > > literally"). > > That's not the reason. It actively strips the backslash because it > knows that it doesn't care about it after this point and, more > importantly, because it needs to extract the raw heredoc tag name > (without the slash or other surrounding quotes) so that it can match > upon that name (say, "EOF") to find the end of the heredoc body. > > It's mostly an accident of implementation (and probably a throwback to > chainlint.sed) that it strips the backslash early in > Lexer::scan_heredoc_tag() even though it doesn't actually have to be > stripped until Lexer::swallow_heredocs() needs to match the tag name > to find the end of the heredoc body. Thus, in retrospect, the > implementation could have retained the backslash (`\EOF`) or quotes > (`'EOF'` or `"EOF"`) and left it for swallow_heredocs() to strip them > only when needed. OK. I think it does make things easier to normalize this a bit, so that ScriptParser::parse_cmd() doesn't have to worry about all of the various spellings. If we recorded a single bit for "this was quoted" alongside the heredoc contents, that would be plenty. But as I (erroneously) said elsewhere, we can worry about that later if we find something useful to do with it. > There's another weird throwback to chainlint.sed in > Lexer::scan_heredoc_tag() where it transforms `<<-` into `<<\t`, which > is potentially more than a little confusing, especially since it is (I > believe) totally unnecessary in the context of chainlint.pl. Ah, I hadn't noticed that. Looks like we use it in swallow_heredocs() to read the tag data itself. But importantly the token stream still has the correct original in it, which we need to correctly match in ScriptParser::parse_cmd(). > > I think that is OK for our purposes here, though we might > > in the long run want to raise a linting error if you accidentally used > > an interpolating here-doc (it's not strictly wrong to do so, but I think > > we generally frown on it as a style thing). > > Such a linting warning would probably have to be context-sensitive so > it only triggers for test_expect_* calls. Yes, definitely. -Peff