On Mon, Jul 30, 2018 at 4:59 PM Jonathan Nieder <jrnieder@xxxxxxxxx> wrote: > Eric Sunshine wrote: > > The subshell linter would normally fold out the here-doc content, but > > 'sed' isn't a proper programming language, so the linter can't > > recognize arbitrary here-doc tags. Instead it has hard-coded knowledge > > of the tags commonly used in the Git tests, specifically EOF, EOT, and > > INPUT_END. > > Oh, hmm. I also see some others (outside subshells, though): > > EXPECT_END > [...] Correct. The linter does fold-out top-level EOF here-docs to hedge against the body of a here-doc containing something that might look like the start of a subshell (which would activate the more strict/expensive &&-chain validation). It special-cases top-level EOF because it's such a common tag name, thus an easy way to avoid false-positives, in general. I didn't bother trying to recognize _every_ possible tag since those other here-docs don't trigger any problems. (It's heuristic-based, after all.) > I wonder if it should look for something like [A-Z][A-Z_]* to catch > all of these. I considered that, but it doesn't handle nested here-docs, which we actually have in the test suite. For instance, from t9300-fast-import: cat >input <<-INPUT_END && mark :2 data <<EOF $file2_data EOF ... INPUT_END Nesting could be handled easily enough either by stashing away the opening tag and matching against it later _or_ by doing recursive here-doc folding, however, 'sed' isn't a proper programming language and can't be coerced into doing either of those. (And, it was tricky enough just getting it to handle the nested case with a limited set of recognized tag names, without having to explicitly handle every combination of those names nested inside one another.) > > The linter also deals with multi-line $(...) expressions, however, it > > currently only recognizes them when the $( is on its own line. > > That's reasonable, especially if "on its own line" means "at end of > line". It does; sorry for not being clear. > What would help most is if the error message could explain what is > going on, but I understand that that can be hard to do in a sed > script. Right, unfortunately, it can't be too helpful, but, when fixing all those broken chains, I did find it useful to dump the result after 'sed' processing in order to identify what it was actually complaining about. I did so by changing the $1 in this line from test-lib.sh: error "bug in the test script: broken &&-chain or run-away HERE-DOC: $1" to print the 'sed' output instead. The linter prepends "??AMP??" and "??SEMI??" to suspect lines. It also prepends lines with ">" to indicate where it thinks a subshell ends. This information was quite helpful when figuring out what was broken in the test (or, for false positives, where a heuristic had gone wrong). I considered enabling this output by default instead of $1 but decided against it since it is only helpful for broken &&-chains in subshells, thus doesn't aid in the more common case of top-level &&-breakage, thus might be confusing. > > I could try to update the linter to not trip over this sort of input, > > however, this test code is indeed ugly and difficult to understand, > > and your rewrite[1] of it makes it far easier to grok, so I'm not sure > > the effort would be worthwhile. What do you think? > > I'd be happy to look over a change that handles more here-doc > delimiters or produces a clearer message for tests in poor style, but > I agree with you that it's not too important. I am, for a couple reasons, somewhat hesitant to tweak the heuristic. First, each tweak has the potential of causing more false-positives or (perhaps worse) false-negatives. The linter's own test-suite is supposed to protect against that, but test suite coverage is never perfect. Second, ideally, the linter should protect against new broken &&-chains from entering the codebase, so poorly coded historic tests such as these aren't necessarily good motivation for tweaking, _and_ it is (hopefully) unlikely that we would allow this sort of ugly shell code to enter the codebase going forward. (The counterargument is that this false-positive doesn't help someone coding up a new test who hasn't yet submitted the patch to the mailing list where more seasoned eyes would suggest better coding style.)