On 11/12/21 09:28, David Woodhouse wrote:
I do not recall that we'd actually reached a conclusion that we *will* make the gfn_to_pfn cache generally usable in that fashion. The latest I knew of that discussion was my message at https://lore.kernel.org/kvm/55a5d4e3fbd29dd55e276b97eeaefd0411b3290b.camel@xxxxxxxxxxxxx/ in which I said I'd be a whole lot happier with that if we could do it with RCU instead of an rwlock — but I don't think we can because we'd need to call synchronize_srcu() in the MMU notifier callback that might not be permitted to sleep?
Why do you have a problem with the rwlock? If it's per-cache, and it's mostly taken within vCPU context (with the exception of Xen), contention should be nonexistent.
I'm also slightly less comfortable with having the MMU notifier work through an arbitrary *list* of gfn_to_pfn caches that it potentially needs to invalidate, but that is very much a minor concern compared with the first. I started looking through the nested code which is the big user of this facility.
Yes, that's also where I got stuck in my first attempt a few months ago. I agree that it can be changed to use gfn-to-hva caches, except for the vmcs12->posted_intr_desc_addr and vmcs12->virtual_apic_page_addr.
Paolo
The important part of the gfn_to_pfn mapping as I've used it for Xen event channel delivery is the fast path in the common case, falling back to a slow path that needs to sleep, to revalidate the mapping. That fast vs. slow path (with a workqueue) already existed for irqfd delivery and I just needed to hook into it in the right places. I didn't see anything in nested code that would benefit from that same setup, and AFAICT it should all be running with current->mm == kvm->mm so surely it ought to be able to just access things using the userspace HVA and sleep if necessary? (There's an *entirely* gratuitous one in nested_cache_shadow_vmcs12() which does a map/memcpy/unmap that really ought to be kvm_read_guest(). I'll send a patch for that shortly)