Re: ppc64le builds taking ages

Kevin Fenzi <kevin@xxxxxxxxx> · Fri, 19 May 2023 10:33:20 -0700

On Fri, May 19, 2023 at 05:19:01PM +0200, Fabio Valentini wrote:
> 
> I've been experiencing similar issues with ppc64le koji builds for the
> past few weeks. They are now by far the slowest architecture, and
> sometimes the build tasks are seemingly just "hanging" or "stuck",
> often for half an hour or longer. Most frequently the tasks look
> locked up doing disk IO, for example, during dnf's or rpm's
> transaction checks (i.e. when installing the buildroot or build
> dependencies).

Often when this happens, it's because the virthost that builder is on
has gone unresponsive and I have to reboot it and bring everything back
up. ;( So, from the koji hub view nothing is happening until the builder
is back up and realizes it should do that build and starts it over.

> I've seen tasks frequently get stuck at "dnf: Running transaction
> check" for *ages* (i.e. 30 minutes or longer), and after the builds on
> all other architectures were long done, they *sometimes* un-stuck
> themselves after a while and the build progressed (albeit very very
> slowly). At other times, the builds were just stuck completely - in
> these cases I've asked releng to free the ppc64le build to restart it,
> and that solved the problem (most of the time) ...
> 
> Asking on the fedora-infra IRC / Matrix channel, nirik mentioned that
> it might be caused by recent kernels (6.1 or 6.2), with 6.3 looking
> better at first glance.

yeah, this has been something we have seen with f37 (and now f38) on the
virthosts. It's really hard to isolate since there's not really any logs
when it happens. ;( 

That said, yes, I did try one of them with a 6.3 kernel and it seemed to
be better (but also this problem only seems sporadic, making it even
harder to isolate). 

I'm prepped all of them with 6.3 now, but I don't want to do reboots
right now since all of: gcc, webkitgtk, ceph, llvm are building away
right now. I'll try and do so this weekend. 

If that doesn't help, I think the next thing to do would be to decrease
vm density. We were fine in the past with 10 vm's per virthost, but
perhaps if we drop to 8 or so it would take some of the pressure off.

Thanks for all the feedback everyone...Hopefully we can get it back to
normal soon. 

kevin
Attachment:
signature.asc

Description: PGP signature
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue