Am 20.04.23 um 18:01 schrieb Claudio Imbrenda:
On machines without the Destroy Secure Configuration Fast UVC, the
topmost level of page tables is set aside and freed asynchronously
as last step of the asynchronous teardown.
Each gmap has a host_to_guest radix tree mapping host (userspace)
addresses (with 1M granularity) to gmap segment table entries (pmds).
If a guest is smaller than 2GB, the topmost level of page tables is the
segment table (i.e. there are only 2 levels). Replacing it means that
the pointers in the host_to_guest mapping would become stale and cause
all kinds of nasty issues.
This patch fixes the issue by synchronously destroying all guests with
only 2 levels of page tables in kvm_s390_pv_set_aside. This will
speed up the process and avoid the issue altogether.
Update s390_replace_asce so it refuses to replace segment type ASCEs.
Signed-off-by: Claudio Imbrenda <imbrenda@xxxxxxxxxxxxx>
Fixes: fb491d5500a7 ("KVM: s390: pv: asynchronous destroy for reboot")
---
arch/s390/kvm/pv.c | 35 ++++++++++++++++++++---------------
arch/s390/mm/gmap.c | 7 +++++++
2 files changed, 27 insertions(+), 15 deletions(-)
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index e032ebbf51b9..ceb8cb628d62 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -39,6 +39,7 @@ struct pv_vm_to_be_destroyed {
u64 handle;
void *stor_var;
unsigned long stor_base;
+ bool small;
};
static void kvm_s390_clear_pv_state(struct kvm *kvm)
@@ -318,7 +319,11 @@ int kvm_s390_pv_set_aside(struct kvm *kvm, u16 *rc, u16 *rrc)
if (!priv)
return -ENOMEM;
- if (is_destroy_fast_available()) {
+ if ((kvm->arch.gmap->asce & _ASCE_TYPE_MASK) == _ASCE_TYPE_SEGMENT) {
+ /* No need to do things asynchronously for VMs under 2GB */
+ res = kvm_s390_pv_deinit_vm(kvm, rc, rrc);
+ priv->small = true;
+ } else if (is_destroy_fast_available()) {
res = kvm_s390_pv_deinit_vm_fast(kvm, rc, rrc);
} else {
priv->stor_var = kvm->arch.pv.stor_var;
@@ -335,7 +340,8 @@ int kvm_s390_pv_set_aside(struct kvm *kvm, u16 *rc, u16 *rrc)
return res;
}
- kvm_s390_destroy_lower_2g(kvm);
+ if (!priv->small)
+ kvm_s390_destroy_lower_2g(kvm);
kvm_s390_clear_pv_state(kvm);
kvm->arch.pv.set_aside = priv;
@@ -418,7 +424,10 @@ int kvm_s390_pv_deinit_cleanup_all(struct kvm *kvm, u16 *rc, u16 *rrc)
/* If a previous protected VM was set aside, put it in the need_cleanup list */
if (kvm->arch.pv.set_aside) {
- list_add(kvm->arch.pv.set_aside, &kvm->arch.pv.need_cleanup);
+ if (((struct pv_vm_to_be_destroyed *)kvm->arch.pv.set_aside)->small)
why do we need a cast here?
+ kfree(kvm->arch.pv.set_aside);
+ else
+ list_add(kvm->arch.pv.set_aside, &kvm->arch.pv.need_cleanup);
kvm->arch.pv.set_aside = NULL;
}
With the comment added that Marc asked for
Acked-by: Christian Borntraeger <borntraeger@xxxxxxxxxxxxx>