Re: What is this update waiting on?

Adam Williamson <adamwill@xxxxxxxxxxxxxxxxx> · Wed, 12 Jul 2023 11:37:38 -0700

On Wed, 2023-07-12 at 18:37 +0200, Fabio Valentini wrote:
> On Wed, Jul 12, 2023 at 6:26 PM Richard W.M. Jones <rjones@xxxxxxxxxx> wrote:
> > 
> > 
> > https://bodhi.fedoraproject.org/updates/FEDORA-2023-8d5b08b005
> > 
> > I don't understand what this update is waiting on / why it cannot go
> > to Fedora immediately.
> > 
> > I waived the tests.
> > 
> > There are still apparently 5 tests "running", but clicking through to
> > the Automated Tests tab shows only 4.  There's no way to show the
> > status of these 4 tests, like are they running now, are they waiting
> > for something, why do they need to run at all if I waived them?
> 
> This might be related to the recent enablement of OpenQA tests for
> critpath packages in rawhide? I'm not sure whether you can waive
> OpenQA tests that are still running, or whether you can only waive
> actual failures once they happen.
> 
> I'd just wait a bit. With updates this large, even signing all the
> packages takes a while, and then OpenQA tests run ...
> That said, it's also not *that* unusual to see rawhide updates be in
> "testing" state for more than an hour now (at least this also happened
> to one of my recent updates). Probably depends on how large the queue
> of OpenQA tests is.

A test time of about 90 minutes is the minimum (for updates that
require the tests dicussed below anyway), because two of the gating
tests are this process:

* Build a live image (using the update)
* Run an install from that live image
* Test the installed system boots

for KDE and GNOME. Another is:

* Deploy an N-1 FreeIPA server and client
* Upgrade both server and client to N, with the update included
* Check everything works

All of these tests inherently take quite a long time, because building
live images and running an install from them takes time, and because
doing FreeIPA deployments and system upgrades takes time. The required
time usually works out to be about 90 minutes. I can't really optimize
these tests much beyond their current state. I believe they are
important and necessary tests, so they are in the gating list and
updates have to wait for them. This has been the case for stable
releases for quite a long time at this point; it's only relatively
"new" for Rawhide.

There is another set of tests which takes even longer - build a
Silverblue ostree, then build an ostree installer image from that
ostree, then install it and test it works - but that set of tests is
not currently in the gating set, so it does not need to complete before
the update goes stable.

I believe Fabio is correct that - as the system is currently designed -
you can't effectively waive a running test. You can file the waiver,
but I believe the gating status calculation only considers waivers for
*failed* tests, not queued or running ones. Honestly, I think this is
the right design; waiving is intended to mean "I have examined this
failure and I'm very sure it was a false one", not "I cannot possibly
wait for this test to complete!" However, the UI design around this
could clearly be improved: if we aren't going to apply waivers to
running/queued tests we probably shouldn't allow Bodhi to *file* them
at that point, and if one somehow exists, Bodhi probably shouldn't
indicate that it's a "live" waiver on the Automated Results tab. Here's
a Bodhi issue on that:
https://github.com/fedora-infra/bodhi/issues/5414

I appreciate that waiting can be a bit frustrating, but I believe it's
a necessary compromise for the substantial increase in quality that
gating on these tests can afford. I'd really hope folks can live with
waiting (usually) 90 minutes before the update is tagged. The waiver
messages Richard created gave the justification "I'll install and test
the packages myself"; unless you're doing all the tests openQA does,
including tests of image builds and installs, you're not necessarily
going to find all the problems it would.

In this case, it took a bit longer than 90 minutes - more like two
hours.

Looking into why, the FreeIPA upgrade test wasn't applied to this
update (as it's not in the "relevant" critical path group), but the
live install tests were (because the update is in the 'critical-path-
compose' group, likely because of libguestfs I guess). KDE live build
test took 1hr 14mins, the 'install and boot' test took 20mins 13secs,
so that's only just over 90 mins. But there was a bit of a delay
starting the tests, because the live build test started at 15:56 while
the update was created at 15:29.

It seems, lately, like there's sometimes a delay between an update
being created and the message that triggers openQA to test it being
published, and that's what happened in this case: the message that
triggered the tests,
https://apps.fedoraproject.org/datagrepper/v2/id?id=15d5bad4-32a5-47e0-abe3-17142c7ab86a&is_raw=true&size=extra-large
, was published at 15:54, 25 minutes after the update was created.

openQA can't start testing the update till that message is published.
I'm not sure why they seem to be sometimes being published late,
recently, but I'll try and look into it if I can.

Oh, the answer to one other question Richard asked: Bodhi indicates
which tests are "required" with a black asterisk in the Automated
Results page.
-- 
Adam Williamson (he/him/his)
Fedora QA
Fedora Chat: @adamwill:fedora.im | Mastodon: @adamw@xxxxxxxxxxxxx
https://www.happyassassin.net

_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue