Re: [PATCH 03/16] KVM: x86: set gfn-to-pfn cache length consistently with VM word size

Paolo Bonzini <pbonzini@xxxxxxxxxx> · Fri, 28 Oct 2022 13:36:20 +0200

On 10/27/22 19:22, Sean Christopherson wrote:
On Thu, Oct 27, 2022, Paolo Bonzini wrote:
So, use the short size at activation time as well.  This means
re-activating the cache if the guest requests the hypercall page
multiple times with different word sizes (this can happen when
kexec-ing, for example).

I don't understand the motivation for allowing a conditionally valid GPA.  I see
a lot of complexity and sub-optimal error handling for a use case that no one
cares about.  Realistically, userspace is never going to provide a GPA that only
works some of the time, because doing otherwise is just asking for a dead guest.

We _should_ be following the Xen API, which does not even say that the
areas have to fit in a single page.  In fact, even Linux's

        struct vcpu_register_runstate_memory_area area;

        area.addr.v = &per_cpu(xen_runstate, cpu);
        if (HYPERVISOR_vcpu_op(VCPUOP_register_runstate_memory_area,
                               xen_vcpu_nr(cpu), &area))

could fail or not just depending on the linker's whims, if I'm not
very confused.

Other data structures *do* have to fit in a page, but the runstate area
does not and it's exactly the one where the cache comes the most handy.
For this I'm going to wait for David to answer.

That said, the whole gpc API is really messy and needs to be cleaned
up beyond what this series does.  For example,

        read_lock_irqsave(&gpc->lock, flags);
        while (!kvm_gfn_to_pfn_cache_check(v->kvm, gpc, gpc->gpa,
                                           sizeof(x))) {
                read_unlock_irqrestore(&gpc->lock, flags);

                if (kvm_gfn_to_pfn_cache_refresh(v->kvm, gpc, gpc->gpa, sizeof(x)))
                        return;

                read_lock_irqsave(&gpc->lock, flags);
        }
	...
        read_unlock_irqrestore(&gpc->lock, flags);

should really be simplified to

	khva = kvm_gpc_lock(gpc);
	if (IS_ERR(khva))
		return;
	...
	kvm_gpc_unlock(gpc);

Only the special preempt-notifier cases would have to use check/refresh
by hand.  If needed they could even pass whatever length they want to
__kvm_gpc_refresh with, explicit marking that it's a here-be-dragons __API.

Also because we're using the gpc from non-preemptible regions the rwlock
critical sections should be enclosed in preempt_disable()/preempt_enable().
Fortunately they're pretty small.

For now I think the best course of action is to quickly get the bugfix
patches to Linus, and for 6.2 drop this one but otherwise keep the length
in kvm_gpc_activate().

Aren't we just making up behavior at this point?  Injecting a #GP into the guest
for what is a completely legal operation from the guest's perspective seems all
kinds of wrong.

Yeah, it is and this patch was a steaming pile...  Though, while this
one is particularly egregious because it comes from an MSR write, a lot
of error returns in xen.c are inviting userspace to make up error
conditions that aren't there in Xen.  In fact activate should probably
ignore a refresh EFAULT altogether.

Paolo