Re: [PATCH v4 08/16] KVM: x86: Annotate -EFAULTs from kvm_handle_error_pfn()

Sean Christopherson <seanjc@xxxxxxxxxx> · Wed, 14 Jun 2023 13:03:30 -0700

On Fri, Jun 02, 2023, Anish Moorthy wrote:
> Implement KVM_CAP_MEMORY_FAULT_INFO for efaults generated by
> kvm_handle_error_pfn().
> 
> Signed-off-by: Anish Moorthy <amoorthy@xxxxxxxxxx>
> ---
>  arch/x86/kvm/mmu/mmu.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index c8961f45e3b1..cb71aae9aaec 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3291,6 +3291,10 @@ static void kvm_send_hwpoison_signal(struct kvm_memory_slot *slot, gfn_t gfn)
>  
>  static int kvm_handle_error_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>  {
> +	uint64_t rounded_gfn;

gfn_t, and probably no need to specificy "rounded", let the code do the talking.

> +	uint64_t fault_size;
> +	uint64_t fault_flags;

With a few exceptions that we should kill off, KVM uses "u64", not "uint64_t".
Though arguably the "size" should be gfn_t too.

And these can all go on a single line, e.g.

	u64 fault_size, fault_flags;

Though since the kvm_run.memory_fault field and the param to the helper are "len",
a simple "len" here is better IMO.

And since this is not remotely performance sensitive, I wouldn't bother deferring
the initialization, e.g.

	gfn_t gfn = gfn_round_for_level(fault->gfn, fault->goal_level);
	gfn_t len = KVM_HPAGE_SIZE(fault->goal_level);
	u64 fault_flags;

All that said, consuming fault->goal_level is unnecessary, and not be coincidence.
The *only* time KVM should bail to userspace is if KVM failed to faultin a 4KiB
page.  Providing a hugepage is done opportunistically, it's never a hard requirement.
So throw away all of the above and see below.

> +
>  	if (is_sigpending_pfn(fault->pfn)) {
>  		kvm_handle_signal_exit(vcpu);
>  		return -EINTR;
> @@ -3309,6 +3313,15 @@ static int kvm_handle_error_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fa
>  		return RET_PF_RETRY;
>  	}
>  
> +	fault_size = KVM_HPAGE_SIZE(fault->goal_level);
> +	rounded_gfn = round_down(fault->gfn * PAGE_SIZE, fault_size);

We have a helper for this too, gfn_round_for_level().  Ooh, but this isn't storing
a gfn, it's storing a gpa.  Naughty, naughty.

> +
> +	fault_flags = 0;
> +	if (fault->write)
> +		fault_flags |= KVM_MEMORY_FAULT_FLAG_WRITE;
> +	if (fault->exec)

exec and write are mutually exclusive.  There's even documented precedent for
this in page_fault_can_be_fast().

So with a READ flag, this can be

	if (fault->write)
		fault_flags = KVM_MEMORY_FAULT_FLAG_WRITE;
	else if (fault->exec)
		fault_flags = KVM_MEMORY_FAULT_FLAG_EXEC;
	else
		fault_flags = KVM_MEMORY_FAULT_FLAG_READ;

Or as Paolo would probably write it ;-)

	fault_flags = (fault->write & 1) << KVM_MEMORY_FAULT_FLAG_WRITE_SHIFT |
		      (fault->exec & 1) << KVM_MEMORY_FAULT_FLAG_EXEC_SHIFT |
		      (!fault->write && !fault->exec) << KVM_MEMORY_FAULT_FLAG_READ_SHIFT;

(that was a joke, don't actually do that)

> +		fault_flags |= KVM_MEMORY_FAULT_FLAG_EXEC;
> +	kvm_populate_efault_info(vcpu, rounded_gfn, fault_size, fault_flags);

This is where passing a "gfn" variable as a "gpa" looks broken.

>  	return -EFAULT;

All in all, something like this?

	u64 fault_flags;

	<other error handling>

	/* comment goes here */
	WARN_ON_ONCE(fault->goal_level != PG_LEVEL_4K);

	if (fault->write)
		fault_flags = KVM_MEMORY_FAULT_FLAG_WRITE;
	else if (fault->exec)
		fault_flags = KVM_MEMORY_FAULT_FLAG_EXEC;
	else
		fault_flags = KVM_MEMORY_FAULT_FLAG_READ;

	kvm_handle_blahblahblah_fault(vcpu, gfn_to_gpa(fault->gfn), PAGE_SIZE,
				      fault_flags);