Re: [PATCH v2 21/30] KVM: s390: pci: handle refresh of PCI translations

Pierre Morel <pmorel@xxxxxxxxxxxxx> · Wed, 19 Jan 2022 19:25:10 +0100

On 1/19/22 17:39, Matthew Rosato wrote:
On 1/19/22 4:29 AM, Pierre Morel wrote:


On 1/14/22 21:31, Matthew Rosato wrote:
...
+static int dma_table_shadow(struct kvm_vcpu *vcpu, struct zpci_dev 
*zdev,
+                dma_addr_t dma_addr, size_t size)
+{
+    unsigned int nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
+    struct kvm_zdev *kzdev = zdev->kzdev;
+    unsigned long *entry, *gentry;
+    int i, rc = 0, rc2;
+
+    if (!nr_pages || !kzdev)
+        return -EINVAL;
+
+    mutex_lock(&kzdev->ioat.lock);
+    if (!zdev->dma_table || !kzdev->ioat.head[0]) {
+        rc = -EINVAL;
+        goto out_unlock;
+    }
+
+    for (i = 0; i < nr_pages; i++) {
+        gentry = dma_walk_guest_cpu_trans(vcpu, &kzdev->ioat, 
dma_addr);
+        if (!gentry)
+            continue;
+        entry = dma_walk_cpu_trans(zdev->dma_table, dma_addr);
+
+        if (!entry) {
+            rc = -ENOMEM;
+            goto out_unlock;
+        }
+
+        rc2 = dma_shadow_cpu_trans(vcpu, entry, gentry);
+        if (rc2 < 0) {
+            rc = -EIO;
+            goto out_unlock;
+        }
+        dma_addr += PAGE_SIZE;
+        rc += rc2;
+    }
+

In case of error, shouldn't we invalidate the shadow tables entries we 
did validate until the error?

Hmm, I don't think this is strictly necessary - the status returned 
should indicate the specified DMA range is now in an indeterminate state 
(putting the onus on the guest to take corrective action via a global 
refresh).

In fact I think I screwed that up below in kvm_s390_pci_refresh_trans, 
the fabricated status should always be KVM_S390_RPCIT_INS_RES.

OK



+out_unlock:
+    mutex_unlock(&kzdev->ioat.lock);
+    return rc;
+}
+
+int kvm_s390_pci_refresh_trans(struct kvm_vcpu *vcpu, unsigned long 
req,
+                   unsigned long start, unsigned long size,
+                   u8 *status)
+{
+    struct zpci_dev *zdev;
+    u32 fh = req >> 32;
+    int rc;
+
+    /* Make sure this is a valid device associated with this guest */
+    zdev = get_zdev_by_fh(fh);
+    if (!zdev || !zdev->kzdev || zdev->kzdev->kvm != vcpu->kvm) {
+        *status = 0;

Wouldn't it be interesting to add some debug information here.
When would this appear?

Yes, I agree -- One of the follow-ons I'd like to add after this series 
is s390dbf entries; this seems like a good spot for one.

As to when this could happen; it should not under normal circumstances, 
but consider something like arbitrary function handles coming from the 
intercepted guest instruction.  We need to ensure that the specified 
function 1) exists and 2) is associated with the guest issuing the refresh.


Also if we have this error this looks like we have a VM problem, 
shouldn't we treat this in QEMU and return -EOPNOTSUPP ?


Well, I'm not sure if we can really tell where the problem is (it could 
for example indicate a misbehaving guest, or a bug in our KVM tracking 
of hostdevs).

The guest chose the function handle, and if we got here then that means 
it doesn't indicate that it's an emulated device, which means either we 
are using the assist and KVM should handle the intercept or we are not 
and userspace should handle it.  But in both of those cases, there 
should be a host device and it should be associated with the guest.

That is right if we can not find an associated zdev = F(fh)
but the two other errors are KVM or QEMU errors AFAIU.


I think if we decide to throw this to userspace in this event, QEMU 
needs some extra code to handle it (basically, if QEMU receives the 
intercept and the device is neither emulated nor using intercept mode 
then we must treat as an invalid handle as this intercept should have 
been handled by KVM)

I do not want to start a discussion on this, I think we can let it like 
this at first and come back to it when we have a good idea on how to 
handle this.
May be just add a /* TODO */




+        return -EINVAL;
+    }
+
+    /* Only proceed if the device is using the assist */
+    if (zdev->kzdev->ioat.head[0] == 0)
+        return -EOPNOTSUPP;
+
+    rc = dma_table_shadow(vcpu, zdev, start, size);
+    if (rc < 0) {
+        /*
+         * If errors encountered during shadow operations, we must
+         * fabricate status to present to the guest
+         */
+        switch (rc) {
+        case -ENOMEM:
+            *status = KVM_S390_RPCIT_INS_RES;
+            break;
+        default:
+            *status = KVM_S390_RPCIT_ERR;
+            break;

As mentioned above I think this switch statement should go away and 
instead always set KVM_S390_RPCIT_INS_RES when rc < 0.

+        }
+    } else if (rc > 0) {
+        /* Host RPCIT must be issued */
+        rc = zpci_refresh_trans((u64) zdev->fh << 32, start, size,
+                    status);
+    }
+    zdev->kzdev->rpcit_count++;
+
+    return rc;
+}
+
  /* Modify PCI: Register floating adapter interruption forwarding */
  static int kvm_zpci_set_airq(struct zpci_dev *zdev)
  {
@@ -620,6 +822,8 @@ EXPORT_SYMBOL_GPL(kvm_s390_pci_attach_kvm);
  int kvm_s390_pci_init(void)
  {
+    int rc;
+
      aift = kzalloc(sizeof(struct zpci_aift), GFP_KERNEL);
      if (!aift)
          return -ENOMEM;
@@ -627,5 +831,7 @@ int kvm_s390_pci_init(void)
      spin_lock_init(&aift->gait_lock);
      mutex_init(&aift->lock);
-    return 0;
+    rc = zpci_get_mdd(&aift->mdd);
+
+    return rc;
  }

diff --git a/arch/s390/kvm/pci.h b/arch/s390/kvm/pci.h
index 54355634df82..bb2be7fc3934 100644
--- a/arch/s390/kvm/pci.h
+++ b/arch/s390/kvm/pci.h
@@ -18,6 +18,9 @@
  #define KVM_S390_PCI_DTSM_MASK 0x40
+#define KVM_S390_RPCIT_INS_RES 0x10
+#define KVM_S390_RPCIT_ERR 0x28
+
  struct zpci_gaite {
      u32 gisa;
      u8 gisc;
@@ -33,6 +36,7 @@ struct zpci_aift {
      struct kvm_zdev **kzdev;
      spinlock_t gait_lock; /* Protects the gait, used during AEN 
forward */
      struct mutex lock; /* Protects the other structures in aift */
+    u32 mdd;
  };
  extern struct zpci_aift *aift;
@@ -47,7 +51,9 @@ static inline struct kvm 
*kvm_s390_pci_si_to_kvm(struct zpci_aift *aift,
  int kvm_s390_pci_aen_init(u8 nisc);
  void kvm_s390_pci_aen_exit(void);
-
+int kvm_s390_pci_refresh_trans(struct kvm_vcpu *vcpu, unsigned long 
req,
+                   unsigned long start, unsigned long end,
+                   u8 *status);
  int kvm_s390_pci_init(void);
  #endif /* __KVM_S390_PCI_H */




--
Pierre Morel
IBM Lab Boeblingen