Re: [PATCH v2 10/13] tests: include detailed trace logs with --write-junit-xml upon failure

SZEDER Gábor <szeder.dev@xxxxxxxxx> · Tue, 16 Oct 2018 18:03:48 +0200

On Tue, Oct 16, 2018 at 03:02:38PM +0200, Johannes Schindelin wrote:
> Hi Gábor,
> 
> On Tue, 16 Oct 2018, SZEDER Gábor wrote:
> 
> > On Mon, Oct 15, 2018 at 03:12:12AM -0700, Johannes Schindelin via GitGitGadget wrote:
> > > From: Johannes Schindelin <johannes.schindelin@xxxxxx>
> > > 
> > > The JUnit XML format lends itself to be presented in a powerful UI,
> > > where you can drill down to the information you are interested in very
> > > quickly.
> > > 
> > > For test failures, this usually means that you want to see the detailed
> > > trace of the failing tests.
> > > 
> > > With Travis CI, we passed the `--verbose-log` option to get those
> > > traces. However, that seems excessive, as we do not need/use the logs in
> > 
> > As someone who has dug into a few occasional failures found by Travis
> > CI, I'd say that the output of '--verbose-log -x' is not excessive,
> > but downright essential.
> 
> I agree that the output is essential for drilling down into failures. This
> paragraph, however, talks about the general case: where there are *no*
> failures. See here:

But you don't know in advance whether there will be any failures or
not, so it only makes sense to run all tests with '--verbose-log -x'
by default, just in case a Heisenbug decides to make an appearance.

> > > almost all of those cases: only when a test fails do we have a way to
> > > include the trace.
> > > 
> > > So let's do something different when using Azure DevOps: let's run all
> > > the tests with `--quiet` first, and only if a failure is encountered,
> > > try to trace the commands as they are executed.
> > > 
> > > Of course, we cannot turn on `--verbose-log` after the fact. So let's
> > > just re-run the test with all the same options, adding `--verbose-log`.
> > > And then munging the output file into the JUnit XML on the fly.
> > > 
> > > Note: there is an off chance that re-running the test in verbose mode
> > > "fixes" the failures (and this does happen from time to time!). That is
> > > a possibility we should be able to live with.
> > 
> > Any CI system worth its salt should provide as much information about
> > any failures as possible, especially when it was lucky enough to
> > stumble upon a rare and hard to reproduce non-deterministic failure.
> 
> I would agree with you if more people started to pay attention to our CI
> failures. And if we had some sort of a development model where a CI
> failure would halt development on that particular topic until the failure
> is fixed, with the responsibility assigned to somebody to fix it.
> 
> This is not the case here, though. pu is broken for ages, at least on
> Windows, and even a *single* topic is enough to do that. And this is even
> worse with flakey tests. I cannot remember *how often* I saw CI failures
> in t5570-git-daemon.sh, for example. It is rare enough that it is obvious
> that this is a problem of the *regression test*, rather than a problem of
> the code that is to be tested.

Some occasional failures in t5570 are actually caused by issues in Git
on certain platforms:

  https://public-inbox.org/git/CAM0VKj=MCS+cmOgzf_XyPeb+qZrFmuMH52-PV_NDMZA9X+rRoA@xxxxxxxxxxxxxx/T/#u

> So I would suggest to go forward with my proposed strategy for the moment,
> right up until the time when we have had the resources to fix t5570, for
> starters.

I don't really understand what the occasional failures in t5570 have
to do with the amount of information a CI system should gather about
failures in general.  Or how many people pay attention to it, or what
kind of development model we have, for that matter.  The way I see it
these are unrelated issues, and a CI system should always provide as
much information about failures as possible.  If only a few people pay
attention to it, then for the sake of those few.

> > > Ideally, we would label this as "Passed upon rerun", and Azure
> > > Pipelines even know about that outcome, but it is not available when
> > > using the JUnit XML format for now:
> > > https://github.com/Microsoft/azure-pipelines-agent/blob/master/src/Agent.Worker/TestResults/JunitResultReader.cs
> > > 
> > > Signed-off-by: Johannes Schindelin <johannes.schindelin@xxxxxx>
> >