Re: [PATCH v2 10/13] tests: include detailed trace logs with --write-junit-xml upon failure

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Tue, 16 Oct 2018 15:02:38 +0200 (DST)

Hi Gábor,

On Tue, 16 Oct 2018, SZEDER Gábor wrote:

> On Mon, Oct 15, 2018 at 03:12:12AM -0700, Johannes Schindelin via GitGitGadget wrote:
> > From: Johannes Schindelin <johannes.schindelin@xxxxxx>
> > 
> > The JUnit XML format lends itself to be presented in a powerful UI,
> > where you can drill down to the information you are interested in very
> > quickly.
> > 
> > For test failures, this usually means that you want to see the detailed
> > trace of the failing tests.
> > 
> > With Travis CI, we passed the `--verbose-log` option to get those
> > traces. However, that seems excessive, as we do not need/use the logs in
> 
> As someone who has dug into a few occasional failures found by Travis
> CI, I'd say that the output of '--verbose-log -x' is not excessive,
> but downright essential.

I agree that the output is essential for drilling down into failures. This
paragraph, however, talks about the general case: where there are *no*
failures. See here:

> > almost all of those cases: only when a test fails do we have a way to
> > include the trace.
> > 
> > So let's do something different when using Azure DevOps: let's run all
> > the tests with `--quiet` first, and only if a failure is encountered,
> > try to trace the commands as they are executed.
> > 
> > Of course, we cannot turn on `--verbose-log` after the fact. So let's
> > just re-run the test with all the same options, adding `--verbose-log`.
> > And then munging the output file into the JUnit XML on the fly.
> > 
> > Note: there is an off chance that re-running the test in verbose mode
> > "fixes" the failures (and this does happen from time to time!). That is
> > a possibility we should be able to live with.
> 
> Any CI system worth its salt should provide as much information about
> any failures as possible, especially when it was lucky enough to
> stumble upon a rare and hard to reproduce non-deterministic failure.

I would agree with you if more people started to pay attention to our CI
failures. And if we had some sort of a development model where a CI
failure would halt development on that particular topic until the failure
is fixed, with the responsibility assigned to somebody to fix it.

This is not the case here, though. pu is broken for ages, at least on
Windows, and even a *single* topic is enough to do that. And this is even
worse with flakey tests. I cannot remember *how often* I saw CI failures
in t5570-git-daemon.sh, for example. It is rare enough that it is obvious
that this is a problem of the *regression test*, rather than a problem of
the code that is to be tested.

So I would suggest to go forward with my proposed strategy for the moment,
right up until the time when we have had the resources to fix t5570, for
starters.

Ciao,
Dscho

> > Ideally, we would label this as "Passed upon rerun", and Azure
> > Pipelines even know about that outcome, but it is not available when
> > using the JUnit XML format for now:
> > https://github.com/Microsoft/azure-pipelines-agent/blob/master/src/Agent.Worker/TestResults/JunitResultReader.cs
> > 
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@xxxxxx>
>