Re: [PATCH 1/1] ci: split the `linux-gcc` job into two jobs

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Fri, 14 Jun 2019 21:35:26 +0200 (CEST)

Hi Gábor,

On Thu, 13 Jun 2019, SZEDER Gábor wrote:

> On Thu, Jun 13, 2019 at 06:51:04PM +0200, Johannes Schindelin wrote:
>
> > On Thu, 13 Jun 2019, Junio C Hamano wrote:
> >
> > > SZEDER Gábor <szeder.dev@xxxxxxxxx> writes:
> > >
> > > > On Thu, Jun 13, 2019 at 05:53:51AM -0700, Johannes Schindelin via
> > > > GitGitGadget wrote:
> > > >> From: Johannes Schindelin <johannes.schindelin@xxxxxx>
> > > >>
> > > >> This job was abused to not only run the test suite in a regular way but
> > > >> also with all kinds of `GIT_TEST_*` options set to non-default values.
> > > >>
> > > >> Let's split this into two
> > > >
> > > > Why...?
> > > >
> > > >> with the `linux-gcc` job running the default
> > > >> test suite, and the newly-introduced `linux-gcc-extra` job running the
> > > >> test suite in the "special" ways.
> > > >>
> > > >> Technically, we would have to build Git only once, but it would not be
> > > >> obvious how to teach Travis to transport build artifacts, so we keep it
> > > >> simple and just build Git in both jobs.
> > >
> > > I had the same reaction.
> >
> > So basically you are saying that the cover letter was the wrong location
> > for this:
> >
> > 	For people like me, who often look at our CI builds, it is hard to
> > 	tell whether test suite failures in the linux-gcc job stem from
> > 	the first make test run, or from the second one, after setting all
> > 	kinds of GIT_TEST_* variables to non-default values.
>
> Is this really an issue in practice?

I don't think that this is the right question. The right question would
be: is this issue possible? And the answer is: yes, quite. The clang and
the GCC toolchains are different enough that they have different bugs and
strengths. And the test suite with extra knobs vs without them *also* is
different enough to expose different bugs. So obviously, you would want
to discern between them [*1*].

But I can even answer the wrong question. The answer is still: yes, quite.

For example, I saw quite a few flaky tests "prefer" one over the other. I
do not recall the specifics (as I investigated at least half a dozeny
flaky tests in the past months, and I am prone to confuse them with one
another), but I distinctly remember debugging via patching
azure-pipelines.yml and ci/ heavily, using the one job that was failing *a
lot* more often (and deleting all the other jobs from that .yml file,
which accelerated the turn-around time, which is *everything* in
debugging).

And even if I had not experienced this. As I said, clang and GCC are
different enough, that's why we have both jobs in the first place. It
sounds rather curious to me that you suggest that they essentially do the
same further below:

> In my experience there are only two (and a half) cases:
>
>   - if both the 'linux-gcc' and 'linux-clang' build jobs fail, then
>     it's some sort of a general breakage.

Sure, that's the easy case.

What I want to help with this patch is the *hard* cases.

>   - if only the 'linux-gcc' build job fails, the 'linux-clang'
>     succeeds, then it's a breakage in the test run with the various
>     'GIT_TEST_*' test knobs enabled (unless the failing 'linux-gcc'
>     build job's runtime is below, say, 5 minutes, in which case it's a
>     build error only triggered by GCC(-8), and, as I recall, is rather
>     rare).

So what you are suggesting is that the part of the `linux-gcc` job where
it tests without all those knobs is totally useless because `linux-clang`
already tested the same stuff?

That does not sound right.

Because by that token, you would want to simply remove that part from the
`linux-gcc` job (instead of splitting out the rest, as my patch does).

I refuse to believe that you are syaing that.

That would sound almost like "We don't need the test suite because 99.9%
of all test cases pass, anyway".

> > 	Let's make it easier on people like me.
> >
> > 	This also helps the problem where the CI builds often finish the
> > 	other jobs waaaay before linux-gcc finally finishes
>
> This is not the case on Travis CI, where the runtime of the macOS
> build jobs are far the longest, so this change won't help anything
> there...

Right, Travis' macOS agents are ridiculously slow.

> on the contrary, it would make things slower by spending time on
> installing dependencies and building Git in one more build job.

No, it wouldn't. Because instead of waiting for the macOS jobs and the
linux-gcc job, we would only wait for the macOS jobs.

The fallacy here is that the 2-3 minutes spent in *two* instead of *one*
agent would accumulate to 2-3 minutes. It's parallel instead.

And once Travis gets faster macOS agents, the Travis build will be overall
faster (instead of now waiting for `linux-gcc` all the time).

Or am I missing anything obvious? I am quite puzzled by your objections,
given your experience with the CI builds. You, too, have *got* to have
experienced the benefits of parallelizing longer-running jobs.

To me, it looks like a no-brainer to split apart a long-running job, to
benefit from running jobs side by side.

Of course, there is also the presentation of the test results, but then,
Travis does not have that. You cannot publish the test results in a visual
manner, nor analyze breakages over time. So in Travis, it does matter less
than in Azure Pipelines (although not by much) what is the name of the job
in which a test failed, it really leaves the developer struggling to get
to the root cause by digging through the entire log. In Azure Pipelines, I
click on the Tests tab (see e.g.
https://dev.azure.com/git/git/_build/results?buildId=677&view=ms.vss-test-web.build-test-results-tab)
and I see immediately not only what test script, not only what test case
failed, being able to see the corresponding part of the verbose output by
clicking on the test case title, I also immediately see in what job it
failed, which can help me debug a lot faster. Also, the analytics section
allows me to see in which jobs tests failed consistently.

And with the split I proposed, it would be obvious from that page, at one
glance, whether I need to use the GIT_TEST_* knobs to reproduce a test
failure locally or not.

So: I am still very, very puzzled why you think it to be a good idea to
have a job that runs twice as long as all the other Linux jobs, that makes
regressions harder to investigate than necessary, and that makes the
overall analysis e.g. of flaky tests more difficult than with my patch.

Ciao,
Dscho

Footnote *1*: Now, a question that Junio raised was whether we should have
the test runs with the GIT_TEST_* knobs *also* for clang. Alas, here I
would like to throw in the argument that a "too complete" test suite is so
useless as to be a wasted effort because *nobody runs it if it takes too
long*. And given the impression I have that Junio does not bother looking
at the CI builds, I wonder why he wanted this in the first place, it's not
like it would benefit him.

> >       too: linux-gcc and
> > 	linux-gcc-extra can be run in parallel, on different agents.
>