Re: [PATCH v8 08/12] s390x/cpu_topology: implementing numa for the s390x topology

Pierre Morel <pmorel@xxxxxxxxxxxxx> · Thu, 21 Jul 2022 09:58:17 +0200

On 7/20/22 19:24, Janis Schoetterl-Glausch wrote:
On 7/15/22 15:07, Pierre Morel wrote:

On 7/15/22 11:11, Janis Schoetterl-Glausch wrote:
On 7/14/22 22:17, Pierre Morel wrote:

On 7/14/22 16:57, Janis Schoetterl-Glausch wrote:
On 6/20/22 16:03, Pierre Morel wrote:
S390x CPU Topology allows a non uniform repartition of the CPU
inside the topology containers, sockets, books and drawers.

We use numa to place the CPU inside the right topology container
and report the non uniform topology to the guest.

Note that s390x needs CPU0 to belong to the topology and consequently
all topology must include CPU0.

We accept a partial QEMU numa definition, in that case undefined CPUs
are added to free slots in the topology starting with slot 0 and going
up.

I don't understand why doing it this way, via numa, makes sense for us.
We report the topology to the guest via STSI, which tells the guest
what the topology "tree" looks like. We don't report any numa distances to the guest.
The natural way to specify where a cpu is added to the vm, seems to me to be
by specify the socket, book, ... IDs when doing a device_add or via -device on
the command line.

[...]

It is a choice to have the core-id to determine were the CPU is situated in the topology.

But yes we can chose the use drawer-id,book-id,socket-id and use a core-id starting on 0 on each socket.

It is not done in the current implementation because the core-id implies the socket-id, book-id and drawer-id together with the smp parameters.

Regardless of whether the core-id or the combination of socket-id, book-id .. is used to specify where a CPU is
located, why use the numa framework and not just device_add or -device ?

You are right, at least we should be able to use both.
I will work on this.

That feels way more natural since it should already just work if you can do hotplug.
At least with core-id and I suspect with a subset of your changes also with socket-id, etc.

yes, it already works with core-id

Whereas numa is an awkward fit since it's for specifying distances between nodes, which we don't do,
and you have to use a hack to get it to specify which CPUs to plug (via setting arch_id to -1).

Is it only for this?

That's what it looks like to me, but I'm not an expert by any means.
x86 reports distances and more via ACPI, riscv via device tree and power appears to
calculate hierarchy values which the linux kernel will turn into distances again.
That's maybe closest to s390x. However, as far as I can tell all of that is static
and cannot be reconfigured. If we want to have STSI dynamically reflect the topology
at some point in the future, we should have a roadmap for how to achieve that.

You are right, numa is redundant for us as we specify the topology using 
the core-id.
The roadmap I would like to discuss is using a new:

(qemu) cpu_move src dst

where src is the current core-id and dst is the destination core-id.

I am aware that there are deep implication on current cpu code but I do 
not think it is not possible.
If it is unpossible then we would need a new argument to the device_add 
for cpu to define the "effective_core_id"
But we will still need the new hmp command to update the topology.

--
Pierre Morel
IBM Lab Boeblingen