Re: [PATCH v5 1/1] s390x: KVM: accept STSI for CPU topology information

Pierre Morel <pmorel@xxxxxxxxxxxxx> · Mon, 13 Dec 2021 11:16:51 +0100

On 12/9/21 17:08, Janosch Frank wrote:
On 11/22/21 14:14, Pierre Morel wrote:
We let the userland hypervisor know if the machine support the CPU
topology facility using a new KVM capability: KVM_CAP_S390_CPU_TOPOLOGY.

The PTF instruction will report a topology change if there is any change
with a previous STSI_15_1_2 SYSIB.
Changes inside a STSI_15_1_2 SYSIB occur if CPU bits are set or clear
inside the CPU Topology List Entry CPU mask field, which happens with
changes in CPU polarization, dedication, CPU types and adding or
removing CPUs in a socket.

The reporting to the guest is done using the Multiprocessor
Topology-Change-Report (MTCR) bit of the utility entry of the guest's
SCA which will be cleared during the interpretation of PTF.

To check if the topology has been modified we use a new field of the
arch vCPU to save the previous real CPU ID at the end of a schedule
and verify on next schedule that the CPU used is in the same socket.

We assume in this patch:
- no polarization change: only horizontal polarization is currently
   used in linux.
- no CPU Type change: only IFL Type are supported in Linux
- Dedication: with this patch, only a complete dedicated CPU stack can
   take benefit of the CPU Topology.

STSI(15.1.x) gives information on the CPU configuration topology.
Let's accept the interception of STSI with the function code 15 and
let the userland part of the hypervisor handle it when userland
support the CPU Topology facility.

Signed-off-by: Pierre Morel <pmorel@xxxxxxxxxxxxx>
---
  Documentation/virt/kvm/api.rst   | 16 ++++++++++
  arch/s390/include/asm/kvm_host.h | 14 ++++++---
  arch/s390/kvm/kvm-s390.c         | 52 +++++++++++++++++++++++++++++++-
  arch/s390/kvm/priv.c             |  7 ++++-
  arch/s390/kvm/vsie.c             |  3 ++
  include/uapi/linux/kvm.h         |  1 +
  6 files changed, 87 insertions(+), 6 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst 
b/Documentation/virt/kvm/api.rst
index aeeb071c7688..e5c9da0782a6 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7484,3 +7484,19 @@ The argument to KVM_ENABLE_CAP is also a 
bitmask, and must be a subset
  of the result of KVM_CHECK_EXTENSION.  KVM will forward to userspace
  the hypercalls whose corresponding bit is in the argument, and return
  ENOSYS for the others.
+
+8.17 KVM_CAP_S390_CPU_TOPOLOGY
+------------------------------
+
+:Capability: KVM_CAP_S390_CPU_TOPOLOGY
+:Architectures: s390
+:Type: vm
+
+This capability indicates that kvm will provide the S390 CPU Topology 
facility
+which consist of the interpretation of the PTF instruction for the 
Function
+Code 2 along with interception and forwarding of both the PTF 
instruction
+with function Codes 0 or 1 and the STSI(15,1,x) instruction to the 
userland

The capitalization of "Function code" is inconsistent.

ok


+hypervisor.
+
+The stfle facility 11, CPU Topology facility, should not be provided 
to the
+guest without this capability.
diff --git a/arch/s390/include/asm/kvm_host.h 
b/arch/s390/include/asm/kvm_host.h
index a604d51acfc8..cccc09a8fdab 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -95,15 +95,19 @@ struct bsca_block {
      union ipte_control ipte_control;
      __u64    reserved[5];
      __u64    mcn;
-    __u64    reserved2;
+#define ESCA_UTILITY_MTCR    0x8000
+    __u16    utility;
+    __u8    reserved2[6];
      struct bsca_entry cpu[KVM_S390_BSCA_CPU_SLOTS];
  };
  struct esca_block {
      union ipte_control ipte_control;
-    __u64   reserved1[7];
+    __u64   reserved1[6];
+    __u16    utility;
+    __u8    reserved2[6];
      __u64   mcn[4];
-    __u64   reserved2[20];
+    __u64   reserved3[20];

Note to self: Prime example for a move to reserved member names based on 
offsets.

yes


      struct esca_entry cpu[KVM_S390_ESCA_CPU_SLOTS];
  };
@@ -228,7 +232,7 @@ struct kvm_s390_sie_block {
      __u8    icptcode;        /* 0x0050 */
      __u8    icptstatus;        /* 0x0051 */
      __u16    ihcpu;            /* 0x0052 */
-    __u8    reserved54;        /* 0x0054 */
+    __u8    mtcr;            /* 0x0054 */
  #define IICTL_CODE_NONE         0x00
  #define IICTL_CODE_MCHK         0x01
  #define IICTL_CODE_EXT         0x02
@@ -247,6 +251,7 @@ struct kvm_s390_sie_block {
  #define ECB_SPECI    0x08
  #define ECB_SRSI    0x04
  #define ECB_HOSTPROTINT    0x02
+#define ECB_PTF        0x01
      __u8    ecb;            /* 0x0061 */
  #define ECB2_CMMA    0x80
  #define ECB2_IEP    0x20
@@ -748,6 +753,7 @@ struct kvm_vcpu_arch {
      bool skey_enabled;
      struct kvm_s390_pv_vcpu pv;
      union diag318_info diag318_info;
+    int prev_cpu;
  };
  struct kvm_vm_stat {

[..]

  }
-void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+static void kvm_s390_set_mtcr(struct kvm_vcpu *vcpu)

We change a vcpu related data structure, there should be "vcpu" in the 
function name to indicate that.

ok


  {
+    struct esca_block *esca = vcpu->kvm->arch.sca;
+    if (vcpu->arch.sie_block->ecb & ECB_PTF) {

I'm wondering if we should replace these checks with the 
test_kvm_facility() ones. ECB_PTF is never changed after vcpu setup, right?

sure, it is left from the first draw as the patch supported both 
interpretation and interception.


+        ipte_lock(vcpu);
+        WRITE_ONCE(esca->utility, ESCA_UTILITY_MTCR);
+        ipte_unlock(vcpu);
+    }
+}
+
+void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+{
      gmap_enable(vcpu->arch.enabled_gmap);
      kvm_s390_set_cpuflags(vcpu, CPUSTAT_RUNNING);
      if (vcpu->arch.cputm_enabled && !is_vcpu_idle(vcpu))
          __start_cpu_timer_accounting(vcpu);
      vcpu->cpu = cpu;
+
+    /*
+     * With PTF interpretation the guest will be aware of topology
+     * change when the Multiprocessor Topology-Change-Report is pending.
+     * We check for events modifying the result of STSI_15_2:
+     * - A new vCPU has been hotplugged (prev_cpu == -1)
+     * - The real CPU backing up the vCPU moved to another socket
+     */
+    if (vcpu->arch.sie_block->ecb & ECB_PTF) {
+        if (vcpu->arch.prev_cpu == -1 ||
+            (topology_physical_package_id(cpu) !=
+             topology_physical_package_id(vcpu->arch.prev_cpu)))

This is barely readable, might be good to put this check in a separate 
function in kvm-s390.h.

ok


+            kvm_s390_set_mtcr(vcpu);
+    }
  }
  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
  {
+    /* Remember which CPU was backing the vCPU */
+    vcpu->arch.prev_cpu = vcpu->cpu;
      vcpu->cpu = -1;
      if (vcpu->arch.cputm_enabled && !is_vcpu_idle(vcpu))
          __stop_cpu_timer_accounting(vcpu);
@@ -3220,6 +3263,13 @@ static int kvm_s390_vcpu_setup(struct kvm_vcpu 
*vcpu)
          vcpu->arch.sie_block->ecb |= ECB_HOSTPROTINT;
      if (test_kvm_facility(vcpu->kvm, 9))
          vcpu->arch.sie_block->ecb |= ECB_SRSI;
+
+    /* PTF needs guest facilities to enable interpretation */

Please explain.
How is this different from any other facility a few lines above in this 
function?

it is not I remove the comment, here again left from the time the patch 
supported interception.


+    if (test_kvm_facility(vcpu->kvm, 11))
+        vcpu->arch.sie_block->ecb |= ECB_PTF;
+    /* Set the prev_cpu value to an impossible value to detect a new 
vcpu */

We can either change this to:
"A prev_value of -1 indicates this is a new vcpu"

Or we define a constant which will also make the check in 
kvm_arch_vcpu_load() easier to understand.

ok, the constant would be clearer.


+    vcpu->arch.prev_cpu = -1;
+
      if (test_kvm_facility(vcpu->kvm, 73))
          vcpu->arch.sie_block->ecb |= ECB_TE;
      if (!kvm_is_ucontrol(vcpu->kvm))
diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index 417154b314a6..26d165733496 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -861,7 +861,8 @@ static int handle_stsi(struct kvm_vcpu *vcpu)
      if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
          return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
-    if (fc > 3) {
+    if ((fc > 3 && fc != 15) ||
+        (fc == 15 && !test_kvm_facility(vcpu->kvm, 11))) {
          kvm_s390_set_psw_cc(vcpu, 3);
          return 0;
      }

How about:

if (fc > 3 && fc != 15)
     goto out_no_data;

/* fc 15 is provided with PTF/CPU topology support */
if (fc == 15 && !test_kvm_facility(vcpu->kvm, 11))
     goto out_no_data;

ok, clearer


Thanks for review,
Pierre

--
Pierre Morel
IBM Lab Boeblingen