On Wed, Nov 13, 2019 at 08:59:08PM +0530, Bharata B Rao wrote: > On Tue, Nov 12, 2019 at 04:34:34PM +1100, Paul Mackerras wrote: > > On Mon, Nov 04, 2019 at 09:47:58AM +0530, Bharata B Rao wrote: > > [snip] > > > @@ -5442,6 +5471,64 @@ static int kvmhv_store_to_eaddr(struct kvm_vcpu *vcpu, ulong *eaddr, void *ptr, > > > return rc; > > > } > > > > > > +/* > > > + * IOCTL handler to turn off secure mode of guest > > > + * > > > + * - Issue ucall to terminate the guest on the UV side > > > + * - Unpin the VPA pages (Enables these pages to be migrated back > > > + * when VM becomes secure again) > > > + * - Recreate partition table as the guest is transitioning back to > > > + * normal mode > > > + * - Release all device pages > > > + */ > > > +static int kvmhv_svm_off(struct kvm *kvm) > > > +{ > > > + struct kvm_vcpu *vcpu; > > > + int srcu_idx; > > > + int ret = 0; > > > + int i; > > > + > > > + if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START)) > > > + return ret; > > > + > > > > A further comment on this code: it should check that no vcpus are > > running and fail if any are running, and it should prevent any vcpus > > from running until the function is finished, using code like that in > > kvmhv_configure_mmu(). That is, it should do something like this: > > > > mutex_lock(&kvm->arch.mmu_setup_lock); > > mmu_was_ready = kvm->arch.mmu_ready; > > if (kvm->arch.mmu_ready) { > > kvm->arch.mmu_ready = 0; > > /* order mmu_ready vs. vcpus_running */ > > smp_mb(); > > if (atomic_read(&kvm->arch.vcpus_running)) { > > kvm->arch.mmu_ready = 1; > > ret = -EBUSY; > > goto out_unlock; > > } > > } > > > > and then after clearing kvm->arch.secure_guest below: > > > > > + srcu_idx = srcu_read_lock(&kvm->srcu); > > > + for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) { > > > + struct kvm_memory_slot *memslot; > > > + struct kvm_memslots *slots = __kvm_memslots(kvm, i); > > > + > > > + if (!slots) > > > + continue; > > > + > > > + kvm_for_each_memslot(memslot, slots) { > > > + kvmppc_uvmem_drop_pages(memslot, kvm, true); > > > + uv_unregister_mem_slot(kvm->arch.lpid, memslot->id); > > > + } > > > + } > > > + srcu_read_unlock(&kvm->srcu, srcu_idx); > > > + > > > + ret = uv_svm_terminate(kvm->arch.lpid); > > > + if (ret != U_SUCCESS) { > > > + ret = -EINVAL; > > > + goto out; > > > + } > > > + > > > + kvm_for_each_vcpu(i, vcpu, kvm) { > > > + spin_lock(&vcpu->arch.vpa_update_lock); > > > + unpin_vpa_reset(kvm, &vcpu->arch.dtl); > > > + unpin_vpa_reset(kvm, &vcpu->arch.slb_shadow); > > > + unpin_vpa_reset(kvm, &vcpu->arch.vpa); > > > + spin_unlock(&vcpu->arch.vpa_update_lock); > > > + } > > > + > > > + ret = kvmppc_reinit_partition_table(kvm); > > > + if (ret) > > > + goto out; > > > + > > > + kvm->arch.secure_guest = 0; > > > > you need to do: > > > > kvm->arch.mmu_ready = mmu_was_ready; > > out_unlock: > > mutex_unlock(&kvm->arch.mmu_setup_lock); > > > > > +out: > > > + return ret; > > > +} > > > + > > > > With that extra check in place, it should be safe to unpin the vpas if > > there is a good reason to do so. ("Userspace has some bug that we > > haven't found" isn't a good reason to do so.) > > QEMU indeed does set_one_reg to reset the VPAs but that only marks > the VPA update as pending. The actual unpinning happens when vcpu > gets to run after reset at which time the VPAs are updated after > any unpinning (if required) > > When secure guest reboots, vpu 0 gets to run and does unpin its > VPA pages and then proceeds with switching to secure. Here UV > tries to page-in all the guest pages, including the still pinned > VPA pages corresponding to other vcpus which haven't had a chance > to run till now. They are all still pinned and hence page-in fails. > > To prevent this, we have to explicitly unpin the VPA pages during > this svm off ioctl. This will ensure that SMP secure guest is able > to reboot correctly. OK, that makes sense. Please put a comment in the code explaining this briefly. > So I will incorporate the code chunk you have shown above to fail > if any vcpu is running and prevent any vcpu from running when > we unpin VPAs from this ioctl. Sounds good. Paul.