On Fri, 2021-11-12 at 13:27 +0100, Paolo Bonzini wrote: > On 11/12/21 12:29, David Woodhouse wrote: > > We *could* use the rwlock thing for steal time reporting, but I still > > don't really think it's worth doing so. Again, if it was truly going to > > be a generic mechanism that would solve lots of other problems, I'd be > > up for it. But if steal time would be the *only* other user of a > > generic version of the rwlock thing, that just seems like > > overengineering. I'm still mostly inclined to stand by my original > > observation that it has a perfectly serviceable HVA that it can use > > instead. > > Well yeah, I'd have to see the code to decide. But maybe this is where > we disagree, what I want from generic KVM is a nice and easy-to-use API. > I only care to a certain extent how messy it is inside, because a nice > generic API means not reinventing the wheel across architectures. I'm happy enough with that if I understand how to make it make sense :) > That said, I think that your patch is much more complicated than it > should be, because it hooks in the wrong place. There are two cases: > > 1) for kmap/kunmap, the invalidate_range() notifier is the right place > to remove any references taken after invalidate_range_start(). Right, I added it in invalidate_range_start() because it was *simpler* that way (I get given a GFN not an HVA) but that's wrong; it really does need to be in invalidate_range() as you say). > For the > APIC access page it needs a kvm_make_all_cpus_request, but for the > shinfo page it can be a simple back-to-back write_lock/write_unlock > (a super-poor-man RCU, if you wish). And it can be extended to a > per-cache rwlock. A back-to-back write_lock/write_unlock *without* setting the address to KVM_UNMAPPED_PAGE? I'm not sure I see how that protects the IRQ delivery from accessing the (now stale) physical page after the MMU notifier has completed? Not unless it's going to call hva_to_pfn again for itself under the read_lock, every time it delivers an IRQ? My version marked it the PFN dirty and invalidated the hva pointer so that nothing will touch it again — but left it actually *mapped* because we can't necessarily sleep to unmap. But that's all it did in the notifier: + if (static_branch_unlikely(&kvm_xen_enabled.key)) { + write_lock(&kvm->arch.xen.shinfo_lock); + + if (kvm->arch.xen.shared_info && + kvm->arch.xen.shinfo_gfn >= range->start && + kvm->arch.xen.shinfo_cache.gfn < range->end) { + /* + * If kvm_xen_shared_info_init() had *finished* mapping the + * page and assigned the pointer for real, then mark the page + * dirty now instead of via the eventual cache teardown. + */ + if (kvm->arch.xen.shared_info != KVM_UNMAPPED_PAGE) { + kvm_set_pfn_dirty(kvm->arch.xen.shinfo_cache.pfn); + kvm->arch.xen.shinfo_cache.dirty = false; + } + + kvm->arch.xen.shared_info = NULL; + } + + write_unlock(&kvm->arch.xen.shinfo_lock); + } > 2) for memremap/memunmap, all you really care about is reacting to > changes in the memslots, so the MMU notifier integration has nothing > to do. You still need to call the same hook as > kvm_mmu_notifier_invalidate_range() when memslots change, so that > the update is done outside atomic context. Hm, we definitely *do* care about reacting to MMU notifiers in this case too. Userspace can do memory overcommit / ballooning etc. *without* changing the memslots, and only mmap/munmap/userfault_fd on the corresponding HVA ranges. > So as far as short-term uses of the cache are concerned, all it > takes (if I'm right...) is a list_for_each_entry in > kvm_mmu_notifier_invalidate_range, visiting the list of > gfn-to-pfn caches and lock-unlock each cache's rwlock. Plus > a way to inform the code of memslot changes before any atomic > code tries to use an invalid cache. I do the memslot part already. It basically ends up being if (!map->hva || map->hva == KVM_UNMAPPED_PAGE || slots->generation != ghc->generation || kvm_is_error_hva(ghc->hva) || !ghc->memslot)) { ret = -EWOULDBLOCK; /* Try again when you can sleep */ goto out_unlock; } > > It looks like they want their own way of handling it; if the GPA being > > invalidated matches posted_intr_desc_addr or virtual_apic_page_addr > > then the MMU notifier just needs to call kvm_make_all_cpus_request() > > with some suitable checking/WARN magic around the "we will never need > > to sleep when we shouldn't" assertion that you made above. > > I was thinking of an extra flag to decide whether (in addition > to the write_lock/write_unlock) the MMU notifier also needs to do > the kvm_make_all_cpus_request: Yeah, OK. I think we want to kill the struct kvm_host_map completely, merge its extra 'hva' and 'page' fields into the (possibly renamed) gfn_to_pfn_cache along with your 'guest_uses_pa' flag, and take it from there. I'll go knock up something that we can heckle further...
Attachment:
smime.p7s
Description: S/MIME cryptographic signature