Re: [PATCH Part2 RFC v4 24/40] KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command

Brijesh Singh <brijesh.singh@xxxxxxx> · Fri, 16 Jul 2021 17:00:53 -0500

On 7/16/21 3:01 PM, Sean Christopherson wrote:
> On Wed, Jul 07, 2021, Brijesh Singh wrote:
>> +static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
>> +{
>> +	unsigned long npages, vaddr, vaddr_end, i, next_vaddr;
>> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>> +	struct sev_data_snp_launch_update data = {};
>> +	struct kvm_sev_snp_launch_update params;
>> +	int *error = &argp->error;
>> +	struct kvm_vcpu *vcpu;
>> +	struct page **inpages;
>> +	struct rmpupdate e;
>> +	int ret;
>> +
>> +	if (!sev_snp_guest(kvm))
>> +		return -ENOTTY;
>> +
>> +	if (!sev->snp_context)
>> +		return -EINVAL;
>> +
>> +	if (copy_from_user(&params, (void __user *)(uintptr_t)argp->data, sizeof(params)))
>> +		return -EFAULT;
>> +
>> +	data.gctx_paddr = __psp_pa(sev->snp_context);
>> +
>> +	/* Lock the user memory. */
>> +	inpages = sev_pin_memory(kvm, params.uaddr, params.len, &npages, 1);
> params.uaddr needs to be checked for validity, e.g. proper alignment.
> sev_pin_memory() does some checks, but not all checks.
>
Noted

>> +	if (!inpages)
>> +		return -ENOMEM;
>> +
>> +	vcpu = kvm_get_vcpu(kvm, 0);
>> +	vaddr = params.uaddr;
>> +	vaddr_end = vaddr + params.len;
>> +
>> +	for (i = 0; vaddr < vaddr_end; vaddr = next_vaddr, i++) {
>> +		unsigned long psize, pmask;
>> +		int level = PG_LEVEL_4K;
>> +		gpa_t gpa;
>> +
>> +		if (!hva_to_gpa(kvm, vaddr, &gpa)) {
> I'm having a bit of deja vu...  This flow needs to hold kvm->srcu to do a memslot
> lookup.
>
> That said, IMO having KVM do the hva->gpa is not a great ABI.  The memslots are
> completely arbitrary (from a certain point of view) and have no impact on the
> validity of the memory pinning or PSP command.  E.g. a memslot update while this
> code is in-flight would be all kinds of weird.
>
> In other words, make userspace provide both the hva (because it's sadly needed
> to pin memory) as well as the target gpa.  That prevents KVM from having to deal
> with memslot lookups and also means that userspace can issue the command before
> configuring the memslots (though I've no idea if that's actually feasible for
> any userspace VMM).

The operation happen during the guest creation time so I was not sure if
memslot will be updated while we are executing this command. But I guess
its possible that a VMM may run different thread which may update
memslot while another thread calls the encryption. I'll let userspace
provide both the HVA and GPA as you recommended.

>> +			ret = -EINVAL;
>> +			goto e_unpin;
>> +		}
>> +
>> +		psize = page_level_size(level);
>> +		pmask = page_level_mask(level);
> Is there any hope of this path supporting 2mb/1gb pages in the not-too-distant
> future?  If not, then I vote to do away with the indirection and just hardcode
> 4kg sizes in the flow.  I.e. if this works on 4kb chunks, make that obvious.

No plans to do 1g/2mb in this path. I will make that obvious by
hardcoding it.

>> +		gpa = gpa & pmask;
>> +
>> +		/* Transition the page state to pre-guest */
>> +		memset(&e, 0, sizeof(e));
>> +		e.assigned = 1;
>> +		e.gpa = gpa;
>> +		e.asid = sev_get_asid(kvm);
>> +		e.immutable = true;
>> +		e.pagesize = X86_TO_RMP_PG_LEVEL(level);
>> +		ret = rmpupdate(inpages[i], &e);
> What happens if userspace pulls a stupid and assigns the same page to multiple
> SNP guests?  Does RMPUPDATE fail?  Can one RMPUPDATE overwrite another?

The RMPUPDATE is available to the hv and it can call anytime with
whatever it want. The important thing is the RMPUPDATE + PVALIDATE
combination is what locks the page. In this case, PSP firmware updates
the RMP table and also validates the page.

If someone else attempts to issue another RMPUPDATE then Validated bit
will be cleared and page is no longer used as a private. Access to
unvalidated page will cause #VC.

>
>> +		if (ret) {
>> +			ret = -EFAULT;
>> +			goto e_unpin;
>> +		}
>> +
>> +		data.address = __sme_page_pa(inpages[i]);
>> +		data.page_size = e.pagesize;
>> +		data.page_type = params.page_type;
>> +		data.vmpl3_perms = params.vmpl3_perms;
>> +		data.vmpl2_perms = params.vmpl2_perms;
>> +		data.vmpl1_perms = params.vmpl1_perms;
>> +		ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE, &data, error);
>> +		if (ret) {
>> +			snp_page_reclaim(inpages[i], e.pagesize);
>> +			goto e_unpin;
>> +		}
>> +
>> +		next_vaddr = (vaddr & pmask) + psize;
>> +	}
>> +
>> +e_unpin:
>> +	/* Content of memory is updated, mark pages dirty */
>> +	memset(&e, 0, sizeof(e));
>> +	for (i = 0; i < npages; i++) {
>> +		set_page_dirty_lock(inpages[i]);
>> +		mark_page_accessed(inpages[i]);
>> +
>> +		/*
>> +		 * If its an error, then update RMP entry to change page ownership
>> +		 * to the hypervisor.
>> +		 */
>> +		if (ret)
>> +			rmpupdate(inpages[i], &e);
> This feels wrong since it's purging _all_ RMP entries, not just those that were
> successfully modified.  And maybe add a RMP "reset" helper, e.g. why is zeroing
> the RMP entry the correct behavior?

By default all the pages are hypervior owned (i.e zero). If the
LAUNCH_UPDATE was successful then page should have transition from the
hypervisor owned to guest valid. By zero'ing it are reverting it back to
hypevisor owned.

I agree that I optimize it to clear the modified entries only and leave
everything else as a default.

thanks

>> +	}
>> +
>> +	/* Unlock the user pages */
>> +	sev_unpin_memory(kvm, inpages, npages);
>> +
>> +	return ret;
>> +}
>> +