Re: s390x KOJI builders issue

Kevin Fenzi <kevin@xxxxxxxxx> · Thu, 3 Mar 2022 10:38:08 -0800

On Thu, Mar 03, 2022 at 02:32:50AM +0100, Michal Schorm wrote:
> In many cases, the build is killed during compilation itself.
> I'd understand the situation, if it would consistently fail somewhere
> during the testsuite on OOM errors, but it's weirder than that.
> 
> Until now, I didn't have this issue. Why now?

In january we got more s390x resources and rebalanced things. 
Before jan 18th the builders had 20GB mem and 4 cpus.
So I suspect if this started happening after that, thats the cause?

> The tests are still important.

Agreed completely. 

> Through the years I took several steps to reduce the resource usage
> for the testsuite.
> The most significant is that I ran the full testsuite only once or few
> times in scratch builds, and when I didn't find any issues worth
> investigating, I switch the testsuite to a minimal mode for every
> other build of the same minor versions.
> So e.g. mass rebuilds which only bump patch numbers in the NVR run
> only the 'main' suite. As well as other small patches during the life
> of that particular upstream release.
> 
> The issue in general is:
> We have the majority of packages which are small and quick to build.
> Then we have a minority of insanely huge projects, whose resource
> thirst can never be quenched. :)
> 
> Could we somehow just identify the huge packages, mark them in a
> special way, and when KOJI would pick up such marked packages, it
> would give it much more resources ?
> At the same time, the average amount of resources given should be
> lowered to only what most packages need.
> I believe all could benefit from this.

Yes, but it gets complex. 

koji has the ability to set policy and send builds matching some
expression to a specific koji 'channel' (ie, group of builders). 

I had to do this for chromium a while back. It was never finishing on
aarch64 on normal builders. We have 2 buildhw's that are bare metal and
have a lot of memory/cpus, so I set those into a heavybuilder channel.
But channel cannot be per arch, so I had to add a bunch of x86 builders
also for the x86_64 build. Sounds great right?

But... if I just add more packages to that channel, there's only 2
aarch64 builders. So, when Tom submits say 4 chromiums, any other
packages that are submitted will just wait until those all finish before
even starting. :( 

ie, if we have a heavybuild channel, it needs enough builders in it to
build as many of the big heavy packages at once as people normally do,
or else its going to serialize builds badly behind the fewest ones. 

So, I'm open to setting mariadb into a channel with bigger builders, but
realize that may mean that there's fewer of them and they may sometimes
have to wait for a builder. ;( 

If this is just s390x builders, I'd prefer to see if we cannot rebalance
them to just pass your builds. So, looking at it, we have 20 buildvm's
on a host with 256gb mem. I could bump them all from 10 to 12 without
overcommiting. I don't know if 2gb would help enough tho? Is that worth
trying before anything else? If that doesn't work, we could reduce and
consolidate builders. ;( 

Thoughts?

kevin
Attachment:
signature.asc

Description: PGP signature
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure