Re: [PATCH v1 3/4] KVM: PPC: Book3S HV: migrate remaining normal-GFNs to secure-GFNs in H_SVM_INIT_DONE

Ram Pai <linuxram@xxxxxxxxxx> · Wed, 3 Jun 2020 16:10:25 -0700

On Tue, Jun 02, 2020 at 03:36:39PM +0530, Bharata B Rao wrote:
> On Mon, Jun 01, 2020 at 12:05:35PM -0700, Ram Pai wrote:
> > On Mon, Jun 01, 2020 at 05:25:18PM +0530, Bharata B Rao wrote:
> > > On Sat, May 30, 2020 at 07:27:50PM -0700, Ram Pai wrote:
> > > > H_SVM_INIT_DONE incorrectly assumes that the Ultravisor has explicitly
> > > > called H_SVM_PAGE_IN for all secure pages.
> > > 
> > > I don't think that is quite true. HV doesn't assume anything about
> > > secure pages by itself.
> > 
> > Yes. Currently, it does not assume anything about secure pages.  But I am
> > proposing that it should consider all pages (except the shared pages) as
> > secure pages, when H_SVM_INIT_DONE is called.
> 
> Ok, then may be also add the proposed changes to H_SVM_INIT_DONE
> documentation.

ok.

> 
> > 
> > In other words, HV should treat all pages; except shared pages, as
> > secure pages once H_SVM_INIT_DONE is called. And this includes pages
> > added subsequently through memory hotplug.
> 
> So after H_SVM_INIT_DONE, if HV touches a secure page for any
> reason and gets encrypted contents via page-out, HV drops the
> device pfn at that time. So what state we would be in that case? We
> have completed H_SVM_INIT_DONE, but still have a normal (but encrypted)
> page in HV?

Good point.

The corresponding GFN will continue to be a secure GFN. Just that its
backing PFN is not a device-PFN, but a memory-PFN. Also that backing
memory-PFN contains encrypted content.

I will clarify this in the patch; about secure-GFN state.

> 
> > 
> > Yes. the Ultravisor can explicitly request the HV to move the pages
> > individually.  But that will slow down the transition too significantly.
> > It takes above 20min to transition them, for a SVM of size 100G.
> > 
> > With this proposed enhancement, the switch completes in a few seconds.
> 
> I think, many pages during initial switch and most pages for hotplugged
> memory are zero pages, for which we don't anyway issue UV page-in calls.
> So the 20min saving you are observing is purely due to hcall overhead?

Apparently, that seems to be the case.

> 
> How about extending H_SVM_PAGE_IN interface or a new hcall to request
> multiple pages in one request?
> 
> Also, how about requesting for bigger page sizes (2M)? Ralph Campbell
> had patches that added THP support for migrate_vma_* calls.

yes. that should give further boost. I think the API does not stop us
from using that feature. Its the support on the Ultravisor side.
Hopefully we will have contributions to the ultravisor once it is
opensourced.

> 
> > 
> > > 
> > > > These GFNs continue to be
> > > > normal GFNs associated with normal PFNs; when infact, these GFNs should
> > > > have been secure GFNs associated with device PFNs.
> > > 
> > > Transition to secure state is driven by SVM/UV and HV just responds to
> > > hcalls by issuing appropriate uvcalls. SVM/UV is in the best position to
> > > determine the required pages that need to be moved into secure side.
> > > HV just responds to it and tracks such pages as device private pages.
> > > 
> > > If SVM/UV doesn't get in all the pages to secure side by the time
> > > of H_SVM_INIT_DONE, the remaining pages are just normal (shared or
> > > otherwise) pages as far as HV is concerned.  Why should HV assume that
> > > SVM/UV didn't ask for a few pages and hence push those pages during
> > > H_SVM_INIT_DONE?
> > 
> > By definition, SVM is a VM backed by secure pages.
> > Hence all pages(except shared) must turn secure when a VM switches to SVM.
> > 
> > UV is interested in only a certain pages for the VM, which it will
> > request explicitly through H_SVM_PAGE_IN.  All other pages, need not
> > be paged-in through UV_PAGE_IN.  They just need to be switched to
> > device-pages.
> > 
> > > 
> > > I think UV should drive the movement of pages into secure side both
> > > of boot-time SVM memory and hot-plugged memory. HV does memslot
> > > registration uvcall when new memory is plugged in, UV should explicitly
> > > get the required pages in at that time instead of expecting HV to drive
> > > the same.
> > > 
> > > > +static int uv_migrate_mem_slot(struct kvm *kvm,
> > > > +		const struct kvm_memory_slot *memslot)
> > > > +{
> > > > +	unsigned long gfn = memslot->base_gfn;
> > > > +	unsigned long end;
> > > > +	bool downgrade = false;
> > > > +	struct vm_area_struct *vma;
> > > > +	int i, ret = 0;
> > > > +	unsigned long start = gfn_to_hva(kvm, gfn);
> > > > +
> > > > +	if (kvm_is_error_hva(start))
> > > > +		return H_STATE;
> > > > +
> > > > +	end = start + (memslot->npages << PAGE_SHIFT);
> > > > +
> > > > +	down_write(&kvm->mm->mmap_sem);
> > > > +
> > > > +	mutex_lock(&kvm->arch.uvmem_lock);
> > > > +	vma = find_vma_intersection(kvm->mm, start, end);
> > > > +	if (!vma || vma->vm_start > start || vma->vm_end < end) {
> > > > +		ret = H_STATE;
> > > > +		goto out_unlock;
> > > > +	}
> > > > +
> > > > +	ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
> > > > +			  MADV_UNMERGEABLE, &vma->vm_flags);
> > > > +	downgrade_write(&kvm->mm->mmap_sem);
> > > > +	downgrade = true;
> > > > +	if (ret) {
> > > > +		ret = H_STATE;
> > > > +		goto out_unlock;
> > > > +	}
> > > > +
> > > > +	for (i = 0; i < memslot->npages; i++, ++gfn) {
> > > > +		/* skip paged-in pages and shared pages */
> > > > +		if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, NULL) ||
> > > > +			kvmppc_gfn_is_uvmem_shared(gfn, kvm))
> > > > +			continue;
> > > > +
> > > > +		start = gfn_to_hva(kvm, gfn);
> > > > +		end = start + (1UL << PAGE_SHIFT);
> > > > +		ret = kvmppc_svm_migrate_page(vma, start, end,
> > > > +			(gfn << PAGE_SHIFT), kvm, PAGE_SHIFT, false);
> > > > +
> > > > +		if (ret)
> > > > +			goto out_unlock;
> > > > +	}
> > > 
> > > Is there a guarantee that the vma you got for the start address remains
> > > valid for all the addresses till end in a memslot? If not, you should
> > > re-get the vma for the current address in each iteration I suppose.
> > 
> > 
> > mm->mmap_sem  is the semaphore that guards the vma. right?  If that
> > semaphore is held, can the vma change?
> 
> I am not sure if the vma you obtained would span the entire address range
> in the memslot.

will check.

Thanks,
RP