On Mon, Jul 1, 2024 at 8:51 PM Jeff King <peff@xxxxxxxx> wrote: > On Mon, Jul 01, 2024 at 06:45:19PM -0400, Eric Sunshine wrote: > > We lose `chainlint` functionality for test bodies specified in this manner. > > Hmm. The patch below seems to work on a simple test. > > The lexer stuffs the heredoc into a special variable. Which at first > glance feels like a hack versus returning it from the token stream, but > the contents really _aren't_ part of that stream. They're a separate > magic thing that is found on the stdin of whatever command the tokens > represent. I created a white-room fix for this issue, as well, before taking a look at your patch. The two implementations bear a strong similarity which suggests that we agree upon the basic approach. My implementation, however, takes a more formal and paranoid stance. Rather than squirreling away only the most-recently-seen heredoc body, it stores each heredoc body along with the tag which introduced it. This makes it robust against cases when multiple heredocs are initiated on the same line (even within different parse contexts): cat <<EOFA && x=$(cat <<EOFB && A body EOFA B body EOFB Of course, that's not likely to come up in the context of test_expect_* calls, but I prefer the added robustness over the more lax approach. > And then ScriptParser::parse_cmd() just has to recognize that any "<<" > token isn't interesting, and that "-" means "read the here-doc". In my implementation, the `<<` token is "interesting" because the heredoc tag is attached to it, and the tag is needed to pluck the heredoc body from the set of saved bodies (since my implementation doesn't assume most-recently-seen body is the correct one). > Obviously we'd want to add to the chainlint tests here. It looks like > the current test infrastructure is focused on evaluating snippets, with > the test_expect_success part already handled. Yes, the "snippet" approach is a throwback to the old chainlint.sed implementation when there wasn't any actual parsing going on. As you note, this unfortunately does not allow for testing parsing-related aspects of the implementation, which is a limitation I most definitely felt when chainlint.pl was implemented. It probably would be a good idea to update the infrastructure to allow for more broad testing but that doesn't need to be part of the changes being discussed here. > diff --git a/t/chainlint.pl b/t/chainlint.pl > @@ -168,12 +168,15 @@ sub swallow_heredocs { > if (pos($$b) > $start) { > my $body = substr($$b, $start, pos($$b) - $start); > + $self->{parser}->{heredoc} .= > + substr($body, 0, length($body) - length($&)); > $self->{lineno} += () = $body =~ /\n/sg; In my implementation, I use regex to strip off the ending tag before storing the heredoc body. When I later looked at your implementation, I noticed that you used substr() -- which seems preferable -- but discovered that it strips too much in some cases. For instance, in t0600, I saw that: cat >expected <<-\EOF && HEAD PSEUDO_WT_HEAD refs/bisect/wt-random refs/heads/main refs/heads/wt-main EOF was getting stripped down to: HEAD PSEUDO_WT_HEAD refs/bisect/wt-random refs/heads/main refs/heads/wt-ma{{missing-nl}} It wasn't immediately obvious why this was happening, though I didn't spend a lot of time trying to debug it. Although I think my implementation is complete, I haven't submitted it yet because I discovered that the changes you made to t1404 are triggering false-positives: # chainlint: t1404-update-ref-errors.sh # chainlint: existing loose ref is a simple prefix of new 120 prefix=refs/1l && 121 test_update_rejected a c e false b c/x d \ 122 '$prefix/c' exists; ?!AMP?! cannot create '$prefix/c/x' Unfortunately, I ran out of time, thus haven't tracked down this problem yet. I also haven't tested your implementation yet to determine if this is due to a change I made or due to a deeper existing issue with chainlint.pl.