On Sun, Sep 11, 2022 at 1:28 AM Jeff King <peff@xxxxxxxx> wrote: > On Thu, Sep 01, 2022 at 12:29:38AM +0000, Eric Sunshine via GitGitGadget wrote: > > A while back, Peff successfully nerd-sniped[1] me into tackling a > > long-brewing idea I had about (possibly) improving "chainlint" performance > > I gave this a read-through, and it looks sensible overall. I have to > admit that I did not carefully check all of your regexes. Given the > relatively low stakes of the code (as an internal build-time tool only) > and the set of tests accompanying it, I'm willing to assume it's good > enough until we see counter-examples. Thanks for the feedback. > I posted some timings and thoughts on the use of threads elsewhere. But > in the end the timings are close enough that I don't care that much > either way. I ran my eye over that message quickly and have been meaning to dig into it and give it a proper response but haven't yet found the time. > I'd also note that I got some first-hand experience with the script as I > merged it with all of my other long-brewing topics, and it found a half > dozen spots, mostly LOOP annotations. At least one was a real "oops, > we'd miss a bug in Git here" spot. Several were "we'd probably notice > the problem because the loop output wouldn't be as expected". One was a > "we're on the left-hand of a pipe, so the exit code doesn't matter > anyway" case, but I am more than happy to fix those if it lets us be > linter-clean. Indeed, I'm not super happy about the linter complaining about cases which obviously can't have an impact on the test's outcome, but (as mentioned elsewhere in the thread), finally convinced myself that the relatively low number of these was outweighed by the quite large number of cases caught by the linter which could have let real problems slip though. Perhaps some day the linter can be made smarter about these cases. > The output took me a minute to adjust to, just because it feels pretty > jumbled when there are several cases. Mostly this is because the > script eats indentation. So it's hard to see the "# chainlint:" comment > starts, let alone the ?! annotations. Here's an example: > [...snip...] > It wasn't too bad once I got the hang of it, but I wonder if a user > writing a single test for the first time may get a bit overwhelmed. I > assume that the indentation is removed as part of the normalization (I > notice extra whitespace around "<", too). That might be hard to address. The script implements a proper parser and lexer, and the lexer is tokenizing the input (throwing away whitespace in the process), thus by the time the parser notices something to complain about with a "?!FOO?!" annotation, the original whitespace is long gone, and it just emits the token stream with "?!FOO?!" inserted at the correct place. In retrospect, the way this perhaps should have been done would have been for the parser to instruct the lexer to emit a "?!FOO?!" annotation at the appropriate point in the input stream. But even that might get a bit hairy since there are cases in which the parser back-patches by removing some "?!AMP?!" annotations when it has decided that it doesn't need to complain about &&-chain breakage. I'm sure it's fixable, but don't know how important it is at this point. > I wonder if color output for "# chainlint" and "?!" annotations would > help, too. It looks like that may be tricky, though, because the > annotations re-parsed internally in some cases. I had the exact same thought about coloring the "# chainlint:" lines and "?!FOO?!" annotations, and how helpful that could be to anyone (not just newcomers). Aside from not having much free time these days, a big reason I didn't tackle it was because doing so properly probably means relying upon some third-party Perl module, and I intentionally wanted to keep the linter independent of add-on modules. Even without a "coloring" module of some sort, if Perl had a standard `curses` module (which it doesn't), then it would have been easy enough to ask `curses` for the proper color codes and apply them as needed. I'm old-school, so it doesn't appeal to me, but an alternative would be to assume it's safe to use ANSI color codes, but even that may have to be done carefully (i.e. checking TERM and accepting only some whitelisted entries, and worrying about about Windows consoles).