Hi Peff, On Wed, 6 Feb 2019, Jeff King wrote: > On Tue, Feb 05, 2019 at 11:34:54AM +0100, Johannes Schindelin wrote: > > > Peff, you asked at the Contributors' Summit for a way to get notified > > when CI fails for your patch, and I was hesitant to add it (even if it > > would be straight-forward, really) because of the false positives. > > > > This is one such example, as the test fails: > > > > https://dev.azure.com/gitgitgadget/git/_build/results?buildId=944 > > > > In particular, the tests t2024 and t5552 are broken for > > ma/doc-diff-usage-fix on Windows. The reason seems to be that those are > > already broken for the base commit that Junio picked: > > jk/diff-rendered-docs (actually, not the tip of it, but the commit fixed > > by Martin's patch). > > > > Of course, I understand why Junio picks base commits that are far, far in > > the past (and have regressions that current `master` does not have), it > > makes sense from the point of view where the fixes should be as close to > > the commits they fix. The downside is that we cannot automate regression > > testing more than we do now, with e.g. myself acting as curator of test > > failures. > > Thanks for this real-world example; it's definitely something I hadn't > thought too hard about. > > I think this is a specific case of more general problems with testing. > We strive to have every commit pass all of the tests, so that people > building on top have a known-good base. But there are many reasons that > may fail in practice (not exhaustive, but just things I've personally > seen): > > - some tests are flaky, and will fail intermittently > > - some tests _used_ to pass, but no longer do if the environment has > changed (e.g., relying on behavior of system tools that change) > > - historical mistakes, where "all tests pass" was only true on one > platform but not others (which I think is what's going on here) > > And these are a problem not just for automated CI, but for running "make > test" locally. I don't think we'll ever get 100% accuracy, so at some > point we have to accept some annoying false positives. The question is > how often they come up, and whether they drown out real problems. > Testing under Linux, my experience with the first two is that they're > uncommon enough not to be a huge burden. > > The third class seems like it is likely to be a lot more common for > Windows builds, since we haven't historically run tests on them. But it > would also in theory be a thing that would get better going forward, as > we fix all of those test failures (and new commits are less likely to be > built on truly ancient history). Indeed, you are absolutely right: things *are* getting better. To me, a big difference is the recent speed-up, making it possible for me to pay more attention to individual branches (which are now tested, too), and if I see any particular breakage in `pu` or `jch` that I saw already in the individual branches, I won't bother digging further. That is already a *huge* relief for me. Granted, I had originally hoped to speed up the test suite, so that it would be faster locally. But I can use the cloud as my computer, too. > So IMHO this isn't really a show-stopper problem, so much as something > that is a sign of the maturing test/CI setup (I say "maturing", not > "mature", as it seems we've probably still got a ways to go). As far as > notifications go, it probably makes sense for them to be something that > requires the user to sign up for anyway, so at that point they're making > their own choice about whether the signal to noise ratio is acceptable. Maybe. I do not even know whether there is an option for that in Azure Pipelines, maybe GitHub offers that? In any case, I just wanted to corroborate with a real-world example what I mentioned at the Contributors' Summit: that I would like to not script that thing yet where contributors are automatically notified when their branches don't pass. > I also think there are ways to automate away some of these problems > (e.g., flake detection by repeating test failures, re-running failures > on parent commits to check whether a patch actually introduced the > problem). But implementing those is non-trivial work, so I am certainly > not asking you to do it. Indeed. It might be a lot more common than just Git, too, in which case I might catch the interest of some of my colleagues who could then implement a solid solution that works not only for us, but for everybody using Azure Pipelines. Speaking of which... can we hook it up with https://github.com/git/git, now that the Azure Pipelines support is in `master`? I sent you and Junio an invitation to https://dev.azure.com/git/git, so that either you or Junio (who are the only owners of the GitHub repository) can set it up. If you want me to help, please do not hesitate to ping me on IRC. Ciao, Dscho