On Wed, Mar 02, 2022 at 05:18:26PM +0100, Erik Skultety wrote: > On Wed, Mar 02, 2022 at 04:42:21PM +0100, Jiri Denemark wrote: > > On Wed, Mar 02, 2022 at 09:43:24 +0000, Daniel P. Berrangé wrote: > > > On Wed, Mar 02, 2022 at 08:42:30AM +0100, Erik Skultety wrote: > > > > ... > > > Ultimately when we switch to using merge requests, the integration tests > > > should be run as a gating job, triggered from the merge train when the > > > code gets applied to git, so that we prevent regressions actually making > > > it into git master at all. > > > > > > Post-merge integration testing always exhibits the problem that people will > > > consider it somebody else's problem to fix the regression. So effectively > > > whoever creates the integration testing system ends up burdened with the > > > job of investigating failures and finding someone to poke to fix it. With > > > it run pre-merge then whoever wants to get their code merged needs to > > > investigate the problems. Now sometimes the problems with of course be > > > with the integration test system itself, not the submitters code, but > > > this is OK because it leads to situation where the job of maintaining > > > the integration tests are more equitably spread across all involved and > > > builds a mindset that functional / integration testing is a critical > > > part of delivering code, which is something we've lacked for too long > > > in libvirt. > > > > This is all fine as long as there's a way to explicitly merge something > > even if CI fails. I don't like waiting for a fix in something completely > > unrelated. For example, a MR with translations or an important bug fix > > I haven't studied GitLab enough to rule it out straight away, but I'd like to > have some more intelligent hook (or whatnot) in place that could detect what > type of change is proposed so that documentation changes and translations would > not go through the whole pipeline and instead would be merged almost > immediately (after passing e.g. the docs build). Yep, for translations all we really need todo is trigger the msgmerge commands to validate that printf format specifiers aren't messed up, so that one is quite easy. In theory we could do something similar for docs only changes, but not sure how amenable our meson rules are to a our meson rules dont For docs our 'website' gitlab job only builds the HTML docs skipping the code. So we could potentially filter that too > An important bug fix however must go through the whole pipeline IMO whether we > like it or not, that's the point, we need to ensure that we're not introducing > a regression to the best of our capabilities. As for CI infrastructure issues, > well, unfortunately that is an inevitable part of doing any automated tasks. By > introducing it upstream it should therefore become a collective responsibility, > so that if a CI issue occurs we need to work together to resolve it ASAP rather > than going around it and pushing a fix just because it takes long to execute. I think the idea of skipping an unreliable pipeline is the wrong way to approach it. For something to be gating the expectation has to be that it is reliable. If something is reliable and it fails with a genuine problem, then it is collective responsibility to drop everything and focus on fixing that problem, not skip it. Sometimes this may cause a delay in merging stuff but that is a good thing, as we've certainly had enough cases over the years where obvious trivial fixes were pushed but were in fact broken. If something breaks and the fix is going to be difficult or out of our control, then we might have no choice but to disable the test jobs completely. For this it should be a case of setting an env var in the gitlab config to skip certain job names, which we can unset once the infra is fixed. An example of this would be where the Cirrus CI API changed recently and completely broke cirrus-run and took a week to fix, and I unset the CIRRUS CI token in the repo. In the case that some jobs are not consistently not reliable then there are a few relevant actions to consider. For example we have ultimately decided to make the all the bleeding edge distros be non-gating because Rawhide/Sid/Tumbleweed regularly broke due to their maintainers pushing broken packages. If we have a test of our own that is periodically failing in a non-determinstic way, then we should immediately disable that test in the test suite, rather than skipping gating CI. The test should only be re-enabled once its reliability is fixed. Tests which are gating must be reliable without false positives. In the absolute worst case, someone with admin permissions can just toggle the 'pipelines must succeed' option in the gitlab repo admin UI, but if it gets to that point we've really failed badly. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|