Re: Gating Fedora updates on Fedora CoreOS CI

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2/13/25 7:01 AM, Neal Gompa wrote:
> On Thu, Feb 13, 2025 at 5:46 AM Clement Verna <cverna@xxxxxxxxxxxxxxxxx> wrote:
>>
>>
>>
>> On Thu, 13 Feb 2025 at 10:13, Dan Horák <dan@xxxxxxxx> wrote:
>>>
>>> On Thu, 13 Feb 2025 09:32:06 +0100
>>> Clement Verna <cverna@xxxxxxxxxxxxxxxxx> wrote:
>>>
>>>> cross posting from
>>>> https://discussion.fedoraproject.org/t/gating-fedora-updates-on-fedora-coreos-ci/144566
>>>>
>>>> Hi all,
>>>>
>>>> Last year, the Fedora CoreOS working group implemented CI testing [1] for
>>>> Bodhi updates on a set of critical packages [2]. Automatic updates are a
>>>> key feature of Fedora CoreOS, and this testing helps us detect update
>>>> related issues early, improving Fedora’s update stability and reducing
>>>> troubleshooting time.
>>>>
>>>> While our long-term goal is to implement this CI testing in fedora-bootc
>>>> with Bodhi gating integration, there's still significant work ahead before
>>>> we can trigger fedora-bootc tests on Bodhi updates. It's worth noting that
>>>> many of the tests currently running in Fedora CoreOS CI are essentially
>>>> "image mode" tests rather than CoreOS-specific tests. Eventually, we expect
>>>> to migrate these tests to fedora-bootc. However, until that infrastructure
>>>> is ready, enabling gating on the FCOS suite provides immediate image mode
>>>> coverage for critical packages.
>>>>
>>>> Given our experience running these tests, we would like to propose making
>>>> the coreos.cosa.build-and-test a required gate for package updates in
>>>> rawhide. We've already been successfully gating packages owned by the
>>>> Fedora CoreOS working group [3], and we'd like to extend this requirement
>>>> to the broader package set defined here [4].
>>>>
>>>> Following is the breakdown of passed vs failed builds by package on over
>>>> 400 builds, this gives package maintainers an idea of how often an update
>>>> might be gated. It is important to note that not all test failures here are
>>>> related to the software in the proposed Bodhi update since there could be
>>>> flakes; either due to the test infra environment or due to some transient
>>>> test pipeline misconfiguration. In the case where failures are not related
>>>> to updates , it would be easy to waive the test or coordinate with the
>>>> Fedora CoreOS working group to disable the test.
>>>
>>> gating based on flaky tests or flaky infra is a no-go, sorry ... You
>>> should define an "acceptable false positive" rate first (1%?, 2%?), then
>>> fix tests and infra and then think about gating. Even when half of the
>>> presented failures are not caused by the package under test, it's too
>>> much.
>>
>>
>> We are starting this conversation because we have good confidence that we flake and infrastructure failure are a minority of cases.
>> Theses test are running on the Fedora Infrastructure (OpenShift cluster) , so not a dedicated infra that would need something special to be fixed and our test framework has also features to make the tests resistant to flakes (we are re-runing failing tests) and we are also able to easily snooze tests for a period of time if needed. If with all of this a critical update is still blocked because of a false positive, it is fairly easy to waive the test in Bodhi (https://docs.fedoraproject.org/en-US/rawhide-gating/faq/#_how_do_i_unblock_an_update)
>>
>> Our tests have caught regressions in the past which have landed in Fedora and affected Fedora users not just Fedora CoreOS users, so we believe that there is a lot of value in making these tests blocking.
>>
> 
> Just because it's easy to waive them doesn't mean it's a good idea to
> do so. Also, because Fedora CoreOS doesn't actually follow Fedora with
> updates (y'all have that pool and manifest thingy that lets you skip
> or downgrade freely), I'm not sure it actually makes sense to make it
> a required gate as long as you are doing that.
> 
> Mandatory gating exists for things where shipping it would utterly
> break things without releng intervention. It is not possible to get
> that far for Fedora CoreOS because you neither take in updates
> automatically, nor can users consume them easily.

I think there is a bit of misunderstanding here. We do have production
streams (next, testing, stable) where we only bump our input lockfiles
when CI has passed on the new package set in the most recent Bodhi updates
compose, but we also have the `rawhide` and `branched` streams that are
literally just whatever is in the repos (i.e. there is no delay here).

Occasionally we will override packages in these streams because we literally
can't build the stream without overriding. See https://github.com/coreos/fedora-coreos-config/pull/3330
for a recent example. All overriding does here is keep `rawhide` building so
that we can continue to validate new packages continuously.

> 
> We already have a problem with some tests being hard for update
> submitters to troubleshoot and resolve, I would like to not add more
> of it.

We (FCOS team) do actively look at these tests too and work with maintainers
to try to understand the failures. We've been collaborating over in 
https://matrix.to/#/#jenkins-coreos:fedoraproject.org

Dusty
-- 
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Users]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]

  Powered by Linux