On Thu, 29 Jul 2021 12:41:05 +0200 Janosch Frank <frankja@xxxxxxxxxxxxx> wrote: > On 7/28/21 4:26 PM, Claudio Imbrenda wrote: > > When a protected VM is created, the topmost level of page tables of > > its ASCE is marked by the Ultravisor; any attempt to use that > > memory for protected virtualization will result in failure. > > > > Only a successful Destroy Configuration UVC will remove the marking. > > > > When the Destroy Configuration UVC fails, the topmost level of page > > tables of the VM does not get its marking cleared; to avoid issues > > it must not be used again. > > > > Since the page becomes in practice unusable, we set it aside and > > leak it. > > > > Signed-off-by: Claudio Imbrenda <imbrenda@xxxxxxxxxxxxx> > > --- > > arch/s390/kvm/pv.c | 53 > > +++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 52 > > insertions(+), 1 deletion(-) > > > > diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c > > index e007df11a2fe..1ecdc1769ed9 100644 > > --- a/arch/s390/kvm/pv.c > > +++ b/arch/s390/kvm/pv.c > > @@ -155,6 +155,55 @@ static int kvm_s390_pv_alloc_vm(struct kvm > > *kvm) return -ENOMEM; > > } > > > > +/* > > + * Remove the topmost level of page tables from the list of page > > tables of > > + * the gmap. > > + * This means that it will not be freed when the VM is torn down, > > and needs > > + * to be handled separately by the caller, unless an intentional > > leak is > > + * intended. > > + */ > > +static void kvm_s390_pv_remove_old_asce(struct kvm *kvm) > > +{ > > + struct page *old; > > + > > + old = virt_to_page(kvm->arch.gmap->table); > > + list_del(&old->lru); > > + /* in case the ASCE needs to be "removed" multiple times */ > > + INIT_LIST_HEAD(&old->lru); > > +} > > + > > +/* > > + * Try to replace the current ASCE with another equivalent one. > > + * If the allocation of the new top level page table fails, the > > ASCE is not > > + * replaced. > > + * In any case, the old ASCE is removed from the list, therefore > > the caller > > + * has to make sure to save a pointer to it beforehands, unless an > > + * intentional leak is intended. > > + */ > > +static int kvm_s390_pv_replace_asce(struct kvm *kvm) > > +{ > > + unsigned long asce; > > + struct page *page; > > + void *table; > > + > > + kvm_s390_pv_remove_old_asce(kvm); > > + > > + page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER); > > + if (!page) > > + return -ENOMEM; > > + list_add(&page->lru, &kvm->arch.gmap->crst_list); > > + > > + table = page_to_virt(page); > > + memcpy(table, kvm->arch.gmap->table, 1UL << > > (CRST_ALLOC_ORDER + PAGE_SHIFT)); > > Don't we want to memcpy first and then add it to the list? > The gmap is still active per-se so I think we want to take the > guest_table_lock for the list_add here. doesn't really make a difference, it is not actually used until a few lines later also, the list is only ever touched here, during guest creation and destruction; IIRC in all those cases we hold kvm->lock > > + > > + asce = (kvm->arch.gmap->asce & ~PAGE_MASK) | __pa(table); > > + WRITE_ONCE(kvm->arch.gmap->asce, asce); > > + WRITE_ONCE(kvm->mm->context.gmap_asce, asce); > > + WRITE_ONCE(kvm->arch.gmap->table, table); > > If I remember correctly those won't need locks but I'm not 100% sure > so please have a look at that. it should not need locks, the VM is in use, so it can't disappear under our feet. > > + > > + return 0; > > +} > > That should both be in gmap.c why? > > + > > /* this should not fail, but if it does, we must not free the > > donated memory */ int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16 > > *rc, u16 *rrc) { > > @@ -169,9 +218,11 @@ int kvm_s390_pv_deinit_vm(struct kvm *kvm, u16 > > *rc, u16 *rrc) atomic_set(&kvm->mm->context.is_protected, 0); > > KVM_UV_EVENT(kvm, 3, "PROTVIRT DESTROY VM: rc %x rrc %x", > > *rc, *rrc); WARN_ONCE(cc, "protvirt destroy vm failed rc %x rrc > > %x", *rc, *rrc); > > - /* Inteded memory leak on "impossible" error */ > > + /* Intended memory leak on "impossible" error */ > > if (!cc) > > kvm_s390_pv_dealloc_vm(kvm); > > + else > > + kvm_s390_pv_replace_asce(kvm); > > return cc ? -EIO : 0; > > } > > > > >