Re: [PATCH v2 0/9] ci: make Git's GitHub workflow output much more helpful

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Mon, 23 May 2022 11:05:37 +0200

On Sat, May 21 2022, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes:
>
>>> * print the verbose logs only for the failed test cases (to massively cut
>>>   down on the size of the log, particularly when there's only a couple
>>>   failures in a test file with a lot of passing tests).
>>
>> That's an amazingly simple trick to improve the speed by a ton, indeed.
>> Thank you for this splendid idea!
>>
>>> * skip printing the full text of the test in 'finalize_test_case_output'
>>>   when creating the group, i.e., use '$1' instead of '$*' (in both passing
>>>   and failing tests, this information is already printed via some other
>>>   means).
>>>
>>> If you wanted to make sure a user could still access the full failure logs
>>> (i.e., including the "ok" test results), you could print a link to the
>>> artifacts page as well - that way, all of the information we currently
>>> provide to users can still be found somewhere.
>>
>> That's a good point, I added that hint to the output (the link is
>> unfortunately not available at the time we print that advice).
>
> https://github.com/git/git/runs/6539786128 shows that all in-flight
> topics merged to 'seen', except for the ds/bundle-uri-more, passes
> the linux-leaks job.  The ds/bundle-uri-more topic introduces some
> leaks to commands that happen to be used in tests that are marked as
> leak-checker clean, making the job fail.
>
> Which makes a great guinea pig for the CI output improvement topic.
>
> So, I created two variants of 'seen' with this linux-leaks breakage.
> One is with the js/ci-github-workflow-markup topic on this thread.
> The other one is with the ab/ci-github-workflow-markup topic (which
> uses a preliminary clean-up ab/ci-setup-simplify topic as its base).
> They should show the identical test results and failures.
>
> And here are their output:
>
>  - https://github.com/git/git/runs/6539835065
>  - https://github.com/git/git/runs/6539900608
>
> If I recall correctly, the selling point of the ab/* variant over
> js/* variant was that it would give quicker UI response compared to
> the former, but other than that, both variants' UI are supposed to
> be as newbie friendly as the other.

...

> When I tried the former, it reacted too poorly to my attempt to
> scroll (with mouse scroll wheel, if it makes a difference) that
> sometimes I was staring a blank dark-gray space for a few seconds
> waiting for it to be filled by something, which was a bit jarring
> experience.  When I tried the latter, it didn't show anything to
> help diagnosing the details of the breakage in "run make test" step
> and the user needed to know "print test failures" needs to be looked
> at, which I am not sure is an inherent limitation of the approach.
> After the single extra click, navigating the test output to find the
> failed steps among many others that succeeded was not a very pleasant
> experience.
>
> Those who are interested in UX experiment may want to visit these
> two output to see how usable each of these is for themselves.

Re selling point & feature comparison: The point of the ab/* variant was
to re-roll Johannes's onto a "base" topic that made much of his
unnecessary, because the building up of features to emit GitHub markup
can be replaced by unrolling things like "make" and "make test" to the
top-level.

That has its own UX benefits, e.g. you can see at a glance what command
was run and what the environment was, and "make" and "make test" are now
split up from one monolithic "build and test" step.

But the primary intention was not to provide a prettier UX, but to show
that this arrangement made sense. I was hoping that Johannes would reply
with some variant of "ah, I see what you mean, that does make things
simpler!" and run with it, but alas...

So small bits in the UX like what you pointed out with needing an extra
click are in there, that one would be easy to solve, it's because we
"focus" on the last step with a false exit code, so we'd just have to
arrange for the "print" step to be that step.

Anyway, the v3 CL of Johannes's series claims that the re-roll "improves
the time to load pages".

I ran both the Firefox and Chrome debugger with performance benchmarks
against:

    https://github.com/git/git/runs/6540394142

And:

    https://github.com/avar/git/runs/6551581584?check_suite_focus=true

The former is what Johannes noted as the correct v3 in
https://lore.kernel.org/git/nycvar.QRO.7.76.6.2205222045130.352@xxxxxxxxxxxxxxxxx/,
the latter is the current "seen" with ab/ci-github-workflow-markup
reverted, i.e. just my "base" changes.

In Chrome/Firefox the time to load the page (as in the spinner stops,
and we "focus" on the right content) is:

    JS: ~60s / ~80s 
    Æ: ~25s / ~18s

This is with Chrome Version 101.0.4951.54 (Official Build) (64-bit) and
Firefox 91.8.0esr (64-bit), both on a Debain Linux x86_64 Dell laptop.

The case of Chrome is quite revealing (since its developer tools seem to
show a better summary). It shows that the "Æ" version spent ~200ms on
"scripting", ~1ms on "rendering", and ~20k ms "idle".

For "JS" that's ~30k ms on "scripting", 15k ms on "rendering", then 7k
ms on "painting" (which is ~0ms in the other one). 7k ms are spent on
"idle".

So these are basically the same performance results as I reported in
earlier iterations.

I think a v4 of this series really deserves a much less terse
CL. I.e. there are specific reports about major slowdowns in the UX. Any
re-roll should really be re-testing those with the same/similar software
and reporting before/after results.

Clearly the primary goal of improving the CI UX should not be to
optimize the rendering of the results as a goal in itself, but in this
case it becomes *so slow* that it precludes certain major use-cases,
such as seeing a failure and being able to view it in some timely
fashion.