Re: [PATCH v3] KVM: x86: Fix recording of guest steal time / preempted status

Paolo Bonzini <pbonzini@xxxxxxxxxx> · Fri, 12 Nov 2021 10:31:42 +0100

On 11/12/21 09:28, David Woodhouse wrote:
I do not recall that we'd actually reached a conclusion that we *will*
make the gfn_to_pfn cache generally usable in that fashion. The latest
I knew of that discussion was my message at
https://lore.kernel.org/kvm/55a5d4e3fbd29dd55e276b97eeaefd0411b3290b.camel@xxxxxxxxxxxxx/
in which I said I'd be a whole lot happier with that if we could do it
with RCU instead of an rwlock — but I don't think we can because we'd
need to call synchronize_srcu() in the MMU notifier callback that might
not be permitted to sleep?

Why do you have a problem with the rwlock?  If it's per-cache, and it's 
mostly taken within vCPU context (with the exception of Xen), contention 
should be nonexistent.

I'm also slightly less comfortable with having the MMU notifier work
through an arbitrary *list* of gfn_to_pfn caches that it potentially
needs to invalidate, but that is very much a minor concern compared
with the first.

I started looking through the nested code which is the big user of this
facility.

Yes, that's also where I got stuck in my first attempt a few months ago. 
 I agree that it can be changed to use gfn-to-hva caches, except for 
the vmcs12->posted_intr_desc_addr and vmcs12->virtual_apic_page_addr.

Paolo

The important part of the gfn_to_pfn mapping as I've used it
for Xen event channel delivery is the fast path in the common case,
falling back to a slow path that needs to sleep, to revalidate the
mapping. That fast vs. slow path (with a workqueue) already existed for
irqfd delivery and I just needed to hook into it in the right places.

I didn't see anything in nested code that would benefit from that same
setup, and AFAICT it should all be running with current->mm == kvm->mm
so surely it ought to be able to just access things using the userspace
HVA and sleep if necessary?

(There's an *entirely* gratuitous one in nested_cache_shadow_vmcs12()
which does a map/memcpy/unmap that really ought to be kvm_read_guest().
I'll send a patch for that shortly)