Re: [PATCH v8 00/12] s390x: CPU Topology

Pierre Morel <pmorel@xxxxxxxxxxxxx> · Fri, 15 Jul 2022 15:47:56 +0200

On 7/15/22 11:31, Janis Schoetterl-Glausch wrote:
On 7/14/22 22:05, Pierre Morel wrote:

On 7/14/22 20:43, Janis Schoetterl-Glausch wrote:
On 6/20/22 16:03, Pierre Morel wrote:
Hi,

This new spin is essentially for coherence with the last Linux CPU
Topology patch, function testing and coding style modifications.

Forword
=======

The goal of this series is to implement CPU topology for S390, it
improves the preceeding series with the implementation of books and
drawers, of non uniform CPU topology and with documentation.

To use these patches, you will need the Linux series version 10.
You find it there:
https://lkml.org/lkml/2022/6/20/590

Currently this code is for KVM only, I have no idea if it is interesting
to provide a TCG patch. If ever it will be done in another series.

To have a better understanding of the S390x CPU Topology and its
implementation in QEMU you can have a look at the documentation in the
last patch or follow the introduction here under.

A short introduction
====================

CPU Topology is described in the S390 POP with essentially the description
of two instructions:

PTF Perform Topology function used to poll for topology change
      and used to set the polarization but this part is not part of this item.

STSI Store System Information and the SYSIB 15.1.x providing the Topology
      configuration.

S390 Topology is a 6 levels hierarchical topology with up to 5 level
      of containers. The last topology level, specifying the CPU cores.

      This patch series only uses the two lower levels sockets and cores.
           To get the information on the topology, S390 provides the STSI
      instruction, which stores a structures providing the list of the
      containers used in the Machine topology: the SYSIB.
      A selector within the STSI instruction allow to chose how many topology
      levels will be provide in the SYSIB.

      Using the Topology List Entries (TLE) provided inside the SYSIB we
      the Linux kernel is able to compute the information about the cache
      distance between two cores and can use this information to take
      scheduling decisions.

Do the socket, book, ... metaphors and looking at STSI from the existing
smp infrastructure even make sense?

Sorry, I do not understand.
I admit the cover-letter is old and I did not rewrite it really good since the first patch series.

What we do is:
Compute the STSI from the SMP + numa + device QEMU parameters .

STSI 15.1.x reports the topology to the guest and for a virtual machine,
this topology can be very dynamic. So a CPU can move from from one topology
container to another, but the socket of a cpu changing while it's running seems
a bit strange. And this isn't supported by this patch series as far as I understand,
the only topology changes are on hotplug.

A CPU changing from a socket to another socket is the only case the PTF instruction reports a change in the topology with the case a new CPU is plug in.

Can a CPU actually change between sockets right now?

To be exact, what I understand is that a shared CPU can be scheduled to 
another real CPU exactly as a guest vCPU can be scheduled by the host to 
another host CPU.

The socket-id is computed from the core-id, so it's fixed, is it not?

the virtual socket-id is computed from the virtual core-id

It is not expected to appear often but it does appear.
The code has been removed from the kernel in spin 10 for 2 reasons:
1) we decided to first support only dedicated and pinned CPU> 2) Christian fears it may happen too often due to Linux host scheduling and could be a performance problem

This seems sensible, but now it seems too static.
For example after migration, you cannot tell the guest which CPUs are in the same socket, book, ...,
unless I'm misunderstanding something.

No, to do this we would need to ask the kernel about it.

And migration is rare, but something you'd want to be able to react to.
And I could imaging that the vCPUs are pinned most of the time, but the pinning changes occasionally.

I think on migration we should just make a kvm_set_mtcr on post_load 
like Nico suggested everything else seems complicated for a questionable 
benefit.

--
Pierre Morel
IBM Lab Boeblingen