On Tue, Jul 31, 2018 at 8:50 AM Jeff King <peff@xxxxxxxx> wrote: > On Mon, Jul 30, 2018 at 05:38:06PM -0400, Eric Sunshine wrote: > > I considered that, but it doesn't handle nested here-docs, which we > > actually have in the test suite. For instance, from t9300-fast-import: > > [...] > > Nesting could be handled easily enough either by stashing away the > > opening tag and matching against it later _or_ by doing recursive > > here-doc folding, however, 'sed' isn't a proper programming language > > and can't be coerced into doing either of those. (And, it was tricky > > enough just getting it to handle the nested case with a limited set of > > recognized tag names, without having to explicitly handle every > > combination of those names nested inside one another.) > > I hesitate to make any suggestion here, as I think we may have passed > a point of useful cost/benefit in sinking more time into this script. > But...is switching to awk or perl an option? Our test suite already > depends on having a vanilla perl, so I don't think it would be a new > dependency. And it would give you actual data structures. It would, and I did consider it, however, I was very concerned about startup cost (launch time) with heavyweight perl considering that it would have to be run for _every_ test. With 13000+ tests, that cost was a very real concern, especially for Windows users, but even for MacOS users (such as myself, for which the full test suite already takes probably close to 30 minutes to run, even on a ram drive). So, I wanted something very lightweight (and deliberately used that word in the commit message), and 'sed' seemed the lightest-weight of the bunch. 'awk' might be about as lightweight as 'sed', and it may even be possible to coerce it into handling the task (since the linter's job is primarily just a bunch of regex matching with very little "manipulating"). v1 of the linter was somewhat simpler and didn't deal with these more complex cases, such as nested here-docs. v1 also did rather more "manipulating" of the script since the result was meant to be run by the shell. When it came time to implement v2, which detects broken &&-chains itself by textual inspection, most of the functionality (coming from v1) was already implemented in 'sed', so 'awk' never really came up as a candidate since rewriting the script from scratch in 'awk' didn't seem like a good idea. (And, at the time v2 was started, I didn't know that these more complex cases would arise.) So, 'awk' might be a viable alternative, and perhaps I'll take a stab at it for fun at some point (or not), but I don't think there's a pressing need right now.