Re: s390x KOJI builders issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 02, 2022 at 03:54:32PM +0100, Florian Weimer wrote:
> * Michael Catanzaro:
> 
> > On Wed, Mar 2 2022 at 02:21:22 PM +0100, Dan Horák <dan@xxxxxxxx>
> > wrote:
> >> those are weird, the build tasks have been restarted many times by the
> >> builder daemon, after something crashed there (OOM?) ...
> >
> > This was happening to me on armv7hl a few weeks ago. Kevin Fenzi
> > investigated and discovered that the builds kept hitting an OOM 
> > condition and then restarting, which triggered an infinite loop. Each
> > build would work for 3-5 hours before failing, then it would start 
> > over, then again, then again....
> >
> > I think some configuration changed recently on the builders, because I
> > had never seen this happen before last month. If a build hits OOM, it 
> > really needs to fail immediately. It should not restart, because it's
> > likely to fail again the same way. My builds had restarted four or
> > five times before Kevin manually handled them.
> 
> Maybe Koji restarts the build because the builder has rebooted?

Nope.

What happens is:

* 10: Build is taken by builder and starts building.
* Build takes up more than 90% of memory+swap
* OOm killer looks and says... oh hey, I need to kill something. This
kojid process/slice is taking up all the memory.
* kojid is killed.
* kojid is restarted (we have it set to restart in unit)
* builder checks into hub
* hub says, hey you are doing task XXXXX right?
* builder says... oh, yes, let me start that.
* goto 10

So in this case it seems like it's the tests that are causing this.
The s390x kvm builders have 2cpus and 10gb of memory.

So, is there any way to decrease memory usage there?
I see the tests have -parallel=auto perhaps that could be set to 1 or 2?

Perhaps there's some way to adjust the oom killer to kill the build
instead of kojid? I would prefer that because then the build would
quickly fail and you could see it was killed and need to reduce memory
consumption somehow.

I suppose we could look at reducing the number of builders and
increasing memory on fewer of them, but it's hard to know what the right
value is there. it's definitely better for mass rebuilds to have more
smaller builders.

kevin

Attachment: signature.asc
Description: PGP signature

_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Users]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]

  Powered by Linux