On Thu, Mar 24 2022, Johannes Schindelin wrote: > Hi Ævar, > > On Wed, 23 Mar 2022, Ævar Arnfjörð Bjarmason wrote: > >> Change the "--immediate" option so that it emits valid TAP on >> failure. Before this it would omit the required plan at the end, >> e.g. under SANITIZE=leak we'd show a "No plan found in TAP output" >> error from "prove": >> >> $ prove t0006-date.sh :: --immediate >> t0006-date.sh .. Dubious, test returned 1 (wstat 256, 0x100) >> Failed 1/22 subtests >> >> Test Summary Report >> ------------------- >> t0006-date.sh (Wstat: 256 Tests: 22 Failed: 1) >> Failed test: 22 >> Non-zero exit status: 1 >> Parse errors: No plan found in TAP output >> Files=1, Tests=22, 0 wallclock secs ( 0.02 usr 0.01 sys + 0.18 cusr 0.06 csys = 0.27 CPU) >> Result: FAIL >> >> Now we'll emit output that doesn't result in TAP parsing failures: >> >> $ prove t0006-date.sh :: --immediate >> t0006-date.sh .. Dubious, test returned 1 (wstat 256, 0x100) >> Failed 1/22 subtests >> >> Test Summary Report >> ------------------- >> t0006-date.sh (Wstat: 256 Tests: 22 Failed: 1) >> Failed test: 22 >> Non-zero exit status: 1 >> Files=1, Tests=22, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.19 cusr 0.05 csys = 0.26 CPU) >> Result: FAIL > > The commit message is strong on the what, very strong in giving verbose > output that might or might not clarify the intention, and a little weak in > the why and the greater context. I thought "so that it emits valid TAP" was sufficiently self-explaining. I.e. we emit this machine-readable format, but in this edge case our output is invalid TAP, now it's valid. Re the verbose output: The last time I submitted patch for tweaking the TAP output I got the feedback (from Junio in https://lore.kernel.org/git/xmqq7dad10oy.fsf@gitster.g/) that a more explicit example was wanted. So I included it this time around. > In the broader story arc, I wonder why we're even bothering with TAP > anymore because it seems that the world has moved on to more powerful file > formats to represent test output, such as JUnit XML, that can be parsed, > rendered and analyzed by powerful special-purpose applications. [I guess that warrants a $subject change] A few things, grouped by heading: == It "can be parsed" First, the "[TAP ]can['t] be parsed" part of that is wrong. As your own CI patch series[1] shows we currently parse the entire TAP output of our test suite on every CI run. Otherwise prove(1) couldn't have come up with this (quoting from your [1]): Test Summary Report ------------------- t1092-sparse-checkout-compatibility.sh (Wstat: 256 Tests: 53 Failed: 1) Failed test: 49 Non-zero exit status: 1 t3701-add-interactive.sh (Wstat: 0 Tests: 71 Failed: 0) TODO passed: 45, 47 That's the result of a compliant TAP parser parsing all of our test output, which has worked for all our tests (sans edge cases like the one being fixed here) for a long time. Let's not go into how you don't like that summary output, that's not the point. That's a UX layer presenting something from the fully parsed output, the point is that it *is* not only parsable, but it's currently *being parsed* by our toolchain on every CI run. == Human readable Unless you're expecting us to eyeball something like this verbose XML on ad-hoc runs like: $ ./t0013-sha1dc.sh --write-junit-xml --tee ok 1 - test-sha1 detects shattered pdf # passed all 1 test(s) 1..1 Which for JUnit is: $ cat out/TEST-t0013-sha1dc.xml <testsuites> <testsuite name="t0013-sha1dc" timestamp="2022-03-24T14:20:27" time="0.065762"> <testcase name="t0013.1 test-sha1 detects shattered pdf
" classname="t0013" time="0.029818"> </testcase> </testsuite> </testsuites> You're going to end up with 2 output formats, a human readable one and some secondary also-machine-readable output. TAP isn't perfect, but it's uniquely suitable as a format for the sort of testing we do, because it's explicitly machine-readable format that's designed to also be human readable. Which isn't a goal with JUnit. Perhaps some XML enthusiasts would argue that point. Sure, you *can* read it, but it's not exactly pleasing nor aiming to reduce verbosity. The JUnit we currently emit is >2x the verbosity of TAP by lines, and ~3x as much by byte count (all of which a human needs to occasionally skim): $ rm -rf out; ./t0001-init.sh --write-junit-xml --tee |wc ; wc <out/*xml 63 488 2553 130 615 7592 == It's trivial to produce Most output formats are way more picky about what constitutes a valid payload than TAP is, which is why for the TAP v.s. JUnit output the number of syscalls we need shows this drastic difference: $ strace --summary-only -f ./t0001-init.sh --tee 2>&1 | tail -n 1 100.00 3.703689 21 171581 13587 total $ strace --summary-only -f ./t0001-init.sh --tee --write-junit-xml 2>&1 | tail -n 1 100.00 6.424080 22 285903 21097 total I.e. the implementation of it just needs an "echo" or whatever, whereas the XML one is shelling out for pretty much every line it prints to escape things. I'm you might argue about the hatefulness about shellscripting there and how we should rewrite everything in C. I don't disagree with that point in the abstract, but realistically we'll be needing to deal with the shellscript-based test suite for the foreseeable future. == It's more flexible The JUnit we write out is just the summary output, but when you run a test with e.g. "--verbose -x" the result is valid TAP. I.e. TAP parser will take this: $ ./t0013-sha1dc.sh -vx 2>&1|wc 38 165 1104 And grab the lines matching "ok" "not ok" etc., and *ignore everything else*. This means that we're able to use it for things that you just couldn't use JUnit for, e.g. the POC I have here where we pair up our "--verbose -x" output with the failure summary, i.e. show what specific test things belongs t [2]: == It doesn't matter if nobody else uses it Or: "we don't need a 2nd machine-parsable format". I think your "the world has moved on to more powerful file formats to represent test output" is a valid concern. I.e. we don't want to have a test suite output that we can't pretty render on whatever the latest fancy test CI renderer is is a completely valid concern, whether that's Azure or GitHub CI. But the implicit assumption there seems to be that TAP isn't rather trivially machine-parsable, and that we aren't already running a parser for it. It is, and we are. Because of that we can just have a post-transformation step of changing TAP into whatever other format you'd like. So you can have JUnit, GitHub CI Markdown or whatever, it should just be produced from the TAP output. I think that your addition of JUnit & GitHub CI Markdown (in [1]) is much better done like that. It avoids the complexity of N output formats in test-lib.sh. A transformer for your desired format probably exists already (or could trivially be adjusted for a new format): https://metacpan.org/search?q=TAP%3A%3AFormatter == Everyone uses it Well, <citation needed>, but while per the above I don't think wide use of TAP even matters I think being concerned about lack of tooling support for it is strange. I'm fairly sure it's the single most widely supported test output format of any kind. Because of how trivial it is to parse and implement I'm pretty sure it's the most widely-supported machine-readable test output format of any kind. *Maybe* something like JUnit has (near) parity if we're counting format transformers-to-transformers, but I'm fairly sure not if it wouldn't be allowed to "cheat" by making use of TAP as an intermediate format. == Cases where TAP isn't good enough currently Someone familiar with our test tooling might thing "wait, TAP with --verbose is valid TAP?" because since 614fe015212 (test-lib: bail out when "-v" used under "prove", 2016-10-22) we have been refusing to produce it in that case: $ prove t0013-sha1dc.sh :: --verbose Bailout called. Further testing stopped: verbose mode forbidden under TAP harness; try --verbose-log FAILED--Further testing stopped: verbose mode forbidden under TAP harness; try --verbose-log Fixing that is needed for the approach in [2] to work 100% of the time, I have working patches to that (which as it happens, are currently blocking on conflicts with your [1]). I.e. the problem is that if we're expecting a line starting with "ok" or whatever the firehose --verbose output from the test itself might emit such a line, and confuse the parser. The solution was rather trivial in the case of TAP, just have test-lib.sh emit a configurable prefix string before any TAP syntax it emits, then have a "tee" tool we ship as a test helper munge that after the fact: https://github.com/avar/git/commit/2358ff8c3a7 https://github.com/avar/git/commit/2ae430bfce7 == Handling the long-tail of machine-readability Continuing the above: I don't think a solution to that sort of thing is really viable with anything that doesn't behave like TAP. We have tests doing things like starting a background daemon that will racily emit output its output *in the middle of another* test. As in not theoretically, it's a case I actually had to debug & test to get the above commits working. The JUnit emitter punts on it entirely by not including the "--verbose -x" output at all. The rather easy solution for TAP explicitly relies on its "non-syntax" of considering every line it doesn't explicitly classify as syntax as something it'll pass through as "output". Those lines thenh need no further escaping. They can pretty much be latin-1, UTF-8, your favorite bits from /dev/urandom etc.. It doesn't matter. Whereas for e.g. any XML, JSON etc. format where things are syntax by default you'd need to intercept all that output to strip "<" and the like from it. Even if you did that you probably wouldn't have anywhere sensible to stick that output, you probably already emitted the beginning/end markers for "here's the verbose output for test N" already, which would require complexity a TAP emitter wouldn't need. > Apart from taking reviewer time away from such more powerful avenues to > make sense of our failing tests, the diff does not really hurt, so I have > no objections against integrating this patch. I hope the above will summarize why those suggested avenues aren't "more powerful", quite the opposite. I think if you did the legwork of trying to make those formats represent our "--verbose -x" output in a machine-readable way and tried to roundtrip that parsed output (which I have done for TAP) that you'd find it somewhere between "much, much harder" and "impossible" to do the same for those other formats in the context of our test suite. So yes, I agree that there's a lack of focus here, and we should put more wood behind fewer arrows in terms of making our test output machine-parsable. The [2] I mentioned above shows the sorts of wins we can get from that, i.e. emitting *only* the lines of test output relevant to the specific failures we had. That's really useful, and something you inherently can't do with the sort of approach you're going for in your [1] series. At least not without buffering the whole thing & parsing it, at which point you're back to square #1 in terms of fixing the sorts of issues noted in "Handling the long-tail of machine-readability" above. 1. https://lore.kernel.org/git/pull.1117.git.1643050574.gitgitgadget@xxxxxxxxx/ 2. https://lore.kernel.org/git/220302.86mti87cj2.gmgdl@xxxxxxxxxxxxxxxxxxx/