Claudio Imbrenda <imbrenda@xxxxxxxxxxxxx> writes: > On machines without the Destroy Secure Configuration Fast UVC, the > topmost level of page tables is set aside and freed asynchronously > as last step of the asynchronous teardown. > > Each gmap has a host_to_guest radix tree mapping host (userspace) > addresses (with 1M granularity) to gmap segment table entries (pmds). > > If a guest is smaller than 2GB, the topmost level of page tables is the > segment table (i.e. there are only 2 levels). Replacing it means that > the pointers in the host_to_guest mapping would become stale and cause > all kinds of nasty issues. > > This patch fixes the issue by disallowing asynchronous teardown for > guests with only 2 levels of page tables. Userspace should (and already > does) try using the normal destroy if the asynchronous one fails. > > Update s390_replace_asce so it refuses to replace segment type ASCEs. > This is still needed in case the normal destroy VM fails. > > Signed-off-by: Claudio Imbrenda <imbrenda@xxxxxxxxxxxxx> > Fixes: fb491d5500a7 ("KVM: s390: pv: asynchronous destroy for reboot") > --- > arch/s390/kvm/pv.c | 5 +++++ > arch/s390/mm/gmap.c | 7 +++++++ > 2 files changed, 12 insertions(+) > > diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c > index e032ebbf51b9..3ce5f4351156 100644 > --- a/arch/s390/kvm/pv.c > +++ b/arch/s390/kvm/pv.c > @@ -314,6 +314,11 @@ int kvm_s390_pv_set_aside(struct kvm *kvm, u16 *rc, u16 *rrc) > */ > if (kvm->arch.pv.set_aside) > return -EINVAL; > + > + /* Guest with segment type ASCE, refuse to destroy asynchronously */ > + if ((kvm->arch.gmap->asce & _ASCE_TYPE_MASK) == _ASCE_TYPE_SEGMENT) > + return -EINVAL; > + > priv = kzalloc(sizeof(*priv), GFP_KERNEL); > if (!priv) > return -ENOMEM; > diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c > index 5a716bdcba05..2267cf9819b2 100644 > --- a/arch/s390/mm/gmap.c > +++ b/arch/s390/mm/gmap.c > @@ -2833,6 +2833,9 @@ EXPORT_SYMBOL_GPL(s390_unlist_old_asce); > * s390_replace_asce - Try to replace the current ASCE of a gmap with a copy > * @gmap: the gmap whose ASCE needs to be replaced > * > + * If the ASCE is a SEGMENT type then this function will return -EINVAL, > + * otherwise the pointers in the host_to_guest radix tree will keep pointing > + * to the wrong pages, causing use-after-free and memory corruption. > * If the allocation of the new top level page table fails, the ASCE is not > * replaced. > * In any case, the old ASCE is always removed from the gmap CRST list. > @@ -2847,6 +2850,10 @@ int s390_replace_asce(struct gmap *gmap) > > s390_unlist_old_asce(gmap); > > + /* Replacing segment type ASCEs would cause serious issues */ > + if ((gmap->asce & _ASCE_TYPE_MASK) == _ASCE_TYPE_SEGMENT) > + return -EINVAL; As discussed... not sure if this is a valid scenario or if it can be considered a bug if it happens. > + > page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER); > if (!page) > return -ENOMEM; > -- > 2.40.0 IMO, much better. Reviewed-by: Marc Hartmayer <mhartmay@xxxxxxxxxxxxx>