Re: Slower builds on non-x86_64 arches - number of cores?

Kevin Fenzi <kevin@xxxxxxxxx> · Sun, 27 Oct 2024 10:35:55 -0700

On Sat, Oct 26, 2024 at 09:00:26PM -0600, Orion Poplawski wrote:
> I've noticed that for quite a while vtk and paraview take significantly
> longer to build on non-x86_64 arches.  The majority of the time appears to
> be in the compilation itself.
> 
> My current guess is that it is particularly helped by having more cores.  I
> currently reduce the number of cores for the paraview ppc64le builds - this
> was done back in 2020 to avoid memory issues, and I'm testing out relaxing
> that.
> 
> I'm quite impressed with how well s390x does with only 3 cores.

Yeah, they are pretty fast too:
    CPU static MHz:      5200
> 
> vtk: https://koji.fedoraproject.org/koji/taskinfo?taskID=125225794
> 
> x86_64:  2:16:39 (-j48)
> aarch64: 3:44:25 (-j12)
> ppc64le: 5:17:29 (-j8)
> s390x:   6:47:39 (-j3)
> 
> paraview: https://koji.fedoraproject.org/koji/taskinfo?taskID=125225830
> (which bundles vtk)
> 
> x86_64:  2:21:58  (-j48)
> aarch64: 3:26:30  (-j12)
> ppc64le: 12:02:47 (-j2) (still not finished)
> s390x:   7:42:33  (-j3)
> 
> I know that it must be quite a challenge figuring out the right balance
> between the number of builders and how many cores per builder there are.
> But at least for these two packages, it seems like the ppc64le builds might
> benefit from some more cores.

Yeah, it's a balancing act. If you make too many smaller vm's, then the
large builds take a long time to finish. However, if you make fewer
larger builders, the larger ones finish faster, but they still take a
while and then the smaller would be faster builds get stopped up waiting
for them to finish. Additionally, memory and cpus are very tied
together, because if you just add more cores without more memory, some
packages (c++ in particular) start just OOMing and never building
(because they try to lanunch N threads which requires N compiler threads
that use a lot of memory). Also, the debuginfo extracting part of the
process can take a lot of memory for things that have a ton of synbols,
etc.

Also, there's things like mass rebuilds, where you need more builders
even if small for things like making srpms or smaller noarch builds.

> I wonder if it would be possible to take a page from the HPC batch
> scheduling playbook and possibly be able to direct parallel capable builds
> to larger builders?  And very serial jobs to low core ones, as I imagine
> your basic python package build probably takes about the same on 1 core as
> 48.

Well, I am not sure how koji would know this... would need some kind of
lookup table/etc. Adding different sizes of builders is possible, but
then it's a lot more complexity, a lot of admin overhead, a lot of
manual updating things. So, in general I prefer to try and keep all the
builders the same. Of course we already don't do that entirely, because
webkitgtk needs more memory/cores to avoid OOMing and just not building,
so there's a few larger ppc64le builders in a 'heavybuild' for it.

So, you might say, "great! lets add vtk to that". But then it runs into
the scarcity problem. If there's a bunch of webkitgtk builds, vtk could
actually take _longer_ because it's waiting for one of those builders to
become available. 

There also seems a tendency for the larger builds (probibly due to
security updates) to do f42/41/40/39/epel10/epel9/epel8 builds, so thats
7 big builds at a time. 

Also, there's CI, where we have seen it flood a bunch of llvm scratch
builds or the like and take up lots of builders until it finishes. 

As a side note, I removed s309x from the noarch_arches when that was
going on, but I have readded it a while back. (So noarch builds could
also happen on s390x). Also, changes in koji 1.35 meant that srpm build
tasks just use noarch_arches, so they too can happen on s390x. I've been
watching things and so far this doesn't seem to be a problem.

> For comparison, here is a long build that isn't that different on the
> different architectures.  I think most of the time is doing tests, but it
> shows that at some level the speed of the various builders are relatively
> equivalent - and so maybe vtk and paraview are very special packages and not
> worth optimizing for:
> 
> lammps: https://koji.fedoraproject.org/koji/taskinfo?taskID=125213968
> 
> x86_64:  6:34:08
> aarch64: 6:44:23
> ppc64le: 7:16:10
> s390x:   6:43:51
> 
> It might be interesting to go through build times and see if many longer
> jobs are unbalanced on the different architectures.

I'm open to suggestions on how to rebalance things. I would vastly
prefer they all just be the same instead of some complex matrix of
things, and would prefer we don't have to try and manually do things all
the time for it. (for example, making particular packages use particular
channels needs changes in the hub policy). 

koji does have a concept of 'weight', but I am not sure it helps us too
much.

Also, longer term for ppc64le I have put in for budget next year to add
memory/faster disks and there is also possibly power10 on the horizon,
which I hope will be faster/biggger.

kevin
-- 
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue