Re: [PATCH v2 0/9] ci: make Git's GitHub workflow output much more helpful

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Wed, 09 Mar 2022 11:56:41 +0100

On Mon, Mar 07 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes:
>
>> I think that's a rather strange conclusion given that I've submitted a
>> parallel series that makes some of those failures easier to diagnose
>> than the same changes in this series. I.e. the failures in build
>> v.s. test phases, not the individual test format output (but those are
>> orthagonal).
>
> If you have a counter-proposal that you feel is solid enough, I do
> not mind dropping the topic in question and replacing it with the
> counter-proposal to let people see how it fares for a few days.  If
> it allows others to view the output easily if you revert the merge
> of this topic into 'seen' and replace with the counter-proposal and
> push it to your own repository, that would be an even better way to
> highlight the differences of two approaches, as that would allow us
> to see the same failures side-by-side.

I proposed [1] as a counter-proposal to *part of* this series, which has
various UX comparisons.

Note that it doesn't aim to change the test failure output
significantly, but to structurally simplify .github/workflows/main.yml
and ci/ for a significant line-count reduction, and improve part of the
output that we get for "make" v.s. "make test", and to make it obvious
what the parameters of each step are.

Which is partially overlapping with 1/2 of Johannes's series here, I
think it makes sense to split up those two concerns & address this more
incrementally.

I.e. my series shows that you can get what the first half of this series
proposes to do by adding GitHub-specific output to ci/* without any such
CI-backend-specific output, and the resulting presentation in the UX is
better.

It'll still apply to "master" with this topic ejected, there was a minor
comment (needing commit message rephrasing) from SZEDER Gábor on it, I
could re-roll it if you're interested.

> Am I correct to understand that one of the the common goals here is
> to eliminate the need to discover how to get to the first failure
> output without turning it slow by 10x to load the output?

That's definitely an eventual common goal, I have a POC for how to do
that with an alternate approach that doesn't suffer from that slowdown,
and shows you much more targeted failure output (only the specific tests
that failed) at [2].

I just think it makes sense to split up the concerns how we arrange
.github/workflows/main.yml & ci/* and how doing that differently can
improve the CI UX, v.s. the mostly orthagonal concern of how test-lib.sh
+ prove can best summarize their failure output.

>> I think it's clear that we're going to disagree on this point, but I'd
>> still think that:
>>
>>  * In a re-roll, you should amend these patches to clearly note that's a
>>    UX trade-off you're making, perhaps with rough before/after timings
>>    similar to the ones I've posted.
>>
>>    I.e. now those patches say nothing about the UX change resulting in
>>    UX that's *much* slower than before. Clearly noting that trade-off
>>    for reviewers is not the same as saying the trade-off can't be made.
>
> Whether we perform counter-proposal comparison or not, the above is
> a reasonable thing to ask.
>
>>  * I don't see why the changes here can't be made configurable (and
>>    perhaps you'd argue they should be on by default) via the ci-config
>>    phase.
>
> I do not know if such a knob is feasible, though.

It would be rather trivial. Basically a matter of adding a "if:
needs.ci-config.outputs.style == 'basic'" to ci/print-test-failures.sh,
and a corresponding flag passed down to ci/lib.sh to have it invoke
test-lib.sh with --github-workflow-markup or not.

I.e. this series detects it's running in GitHub CI and optionally tells
test-lib.sh to emit different output, so to get the old output you'd
just need to not opt-in to that.

I think we can have our cake and eat it too though, so I don't think
there's any point in such a knob per-se. The only reason I'm suggesting
it is because Johannes doesn't otherwise seem to want to address the
significant CI UX slowdowns in this series.

I do think the approach taken by this series is inherently limited in
solving that problem though, in a way that the approach outlined in [2]
isn't.

I.e. the problem is that we're throwing say ~30k-90k lines of raw CI
output at some GitHub JavaScript to format and present. Your browser
needs to download all the output, and then the browser needs to spin at
100% CPU to present it to you.

The reason for that is that anything that tweaks the test-lib.sh output
is something you need to consume *in its entirety*. I.e. when you have a
sequence like:

    ok 1 test one
    ok 2 test two
    not ok 3 test three
    1..3

You don't know until the third test that you've had a failure. Short of
teaching test-lib.sh even more complexity (e.g. pre-buffering its
"passing" output) a UX layer needs to present all of it, and won't
benefit from a parsed representation of it.

Which is why I think adding other output formatters to test-lib.sh is a
dead end approach to this problem.

I mean, sure we could start parsing the new output emitted here, but
that doesn't make sense either.

We already run a full TAP parser over the output of the entire test
suite, which we can easily use to only print details about those tests
that failed[2]. We could also still have the full & raw output, but that
could be in some other tab (or "stage", just like
"ci/print-test-failures.sh" is now.

To be fair that isn't quite as trivial in terms of patch count. In
particular we have edge cases currently where the TAP output isn't valid
due to bugs in test-lib.sh and/or tests, and can't combine it with the
--verbose output. The demo at [2] is on top of some patches I've got
locally to fix all of that.

But I really think it's worth it to move a bit slower there & get it
right rather than heading down the wrong direction of having another
not-quite-machine-readable output target.

I.e. once it's machine readable & parsed we can present it however we
like, and can do *much better* output such as correlating trace output
to the test source, and e.g. showing what specific line we failed
on. Right now all of that needs to be eyeballed & inferred from the
"--verbose -x" output (which is often non-obvious & a pain, especially
when the "test_expect_success" block calls helper functions).

1. https://lore.kernel.org/git/cover-00.25-00000000000-20220221T143936Z-avarab@xxxxxxxxx/
2. https://lore.kernel.org/git/220302.86mti87cj2.gmgdl@xxxxxxxxxxxxxxxxxxx/