On 7/21/22 09:58, Pierre Morel wrote: > > > On 7/20/22 19:24, Janis Schoetterl-Glausch wrote: >> On 7/15/22 15:07, Pierre Morel wrote: >>> >>> >>> On 7/15/22 11:11, Janis Schoetterl-Glausch wrote: >>>> On 7/14/22 22:17, Pierre Morel wrote: >>>>> >>>>> >>>>> On 7/14/22 16:57, Janis Schoetterl-Glausch wrote: >>>>>> On 6/20/22 16:03, Pierre Morel wrote: >>>>>>> S390x CPU Topology allows a non uniform repartition of the CPU >>>>>>> inside the topology containers, sockets, books and drawers. >>>>>>> >>>>>>> We use numa to place the CPU inside the right topology container >>>>>>> and report the non uniform topology to the guest. >>>>>>> >>>>>>> Note that s390x needs CPU0 to belong to the topology and consequently >>>>>>> all topology must include CPU0. >>>>>>> >>>>>>> We accept a partial QEMU numa definition, in that case undefined CPUs >>>>>>> are added to free slots in the topology starting with slot 0 and going >>>>>>> up. >>>>>> >>>>>> I don't understand why doing it this way, via numa, makes sense for us. >>>>>> We report the topology to the guest via STSI, which tells the guest >>>>>> what the topology "tree" looks like. We don't report any numa distances to the guest. >>>>>> The natural way to specify where a cpu is added to the vm, seems to me to be >>>>>> by specify the socket, book, ... IDs when doing a device_add or via -device on >>>>>> the command line. >>>>>> >>>>>> [...] >>>>>> >>>>> >>>>> It is a choice to have the core-id to determine were the CPU is situated in the topology. >>>>> >>>>> But yes we can chose the use drawer-id,book-id,socket-id and use a core-id starting on 0 on each socket. >>>>> >>>>> It is not done in the current implementation because the core-id implies the socket-id, book-id and drawer-id together with the smp parameters. >>>>> >>>>> >>>> Regardless of whether the core-id or the combination of socket-id, book-id .. is used to specify where a CPU is >>>> located, why use the numa framework and not just device_add or -device ? >>> >>> You are right, at least we should be able to use both. >>> I will work on this. >>> >>>> >>>> That feels way more natural since it should already just work if you can do hotplug. >>>> At least with core-id and I suspect with a subset of your changes also with socket-id, etc. >>> >>> yes, it already works with core-id >>> >>>> >>>> Whereas numa is an awkward fit since it's for specifying distances between nodes, which we don't do, >>>> and you have to use a hack to get it to specify which CPUs to plug (via setting arch_id to -1). >>>> >>> >>> Is it only for this? >>> >> That's what it looks like to me, but I'm not an expert by any means. >> x86 reports distances and more via ACPI, riscv via device tree and power appears to >> calculate hierarchy values which the linux kernel will turn into distances again. >> That's maybe closest to s390x. However, as far as I can tell all of that is static >> and cannot be reconfigured. If we want to have STSI dynamically reflect the topology >> at some point in the future, we should have a roadmap for how to achieve that. >> >> > > > You are right, numa is redundant for us as we specify the topology using the core-id. > The roadmap I would like to discuss is using a new: > > (qemu) cpu_move src dst > > where src is the current core-id and dst is the destination core-id. > > I am aware that there are deep implication on current cpu code but I do not think it is not possible. > If it is unpossible then we would need a new argument to the device_add for cpu to define the "effective_core_id" > But we will still need the new hmp command to update the topology. > I don't think core-id is the right one, that's the guest visible CPU address, isn't it? Although it seems badly named then, since multiple threads are part of the same core (ok, we don't support threads). Instead socket-id, book-id could be changed dynamically instead of being computed from the core-id.