On Mon, 2021-11-15 at 22:59 +0000, Sean Christopherson wrote: > On Mon, Nov 15, 2021, Paolo Bonzini wrote: > > On 11/15/21 20:11, David Woodhouse wrote: > > > > Changing mn_memslots_update_rcuwait to a waitq (and renaming it to > > > > mn_invalidate_waitq) is of course also a possibility. > > > I suspect that's the answer. > > > > > > I think the actual*invalidation* of the cache still lives in the > > > invalidate_range() callback where I have it at the moment. > > Oooh! [finally had a lightbulb moment about ->invalidate_range() after years of > befuddlement]. > > Two things: > > 1. Using _only_ ->invalidate_range() is not correct. ->invalidate_range() is > required if and only if the old PFN needs to be _unmapped_. Specifically, > if the protections are being downgraded without changing the PFN, it doesn't > need to be called. E.g. from hugetlb_change_protection(): OK, that's kind of important to realise. Thanks. So, I had just split the atomic and guest-mode invalidations apart: https://git.infradead.org/users/dwmw2/linux.git/commitdiff/6cf5fe318fd but will go back to doing it all in invalidate_range_start from a single list. And just deal with the fact that the atomic users now have to loop/retry/wait for there *not* to be an MMU notification in progress. > I believe we could use ->invalidate_range() to handle the unmap case if KVM's > ->invalidate_range_start() hook is enhanced to handle the RW=>R case. The > "struct mmu_notifier_range" provides the event type, IIUC we could have the > _start() variant handle MMU_NOTIFY_PROTECTION_{VMA,PAGE} (and maybe > MMU_NOTIFY_SOFT_DIRTY?), and let the more precise unmap-only variant handle > everything else. Not sure that helps us much. It was the termination condition on the "when should we keep retrying, and when should we give up?" that was painful, and a mixed mode doesn't that problem it go away. I'll go back and have another look in the morning, with something much closer to what I showed in https://lore.kernel.org/kvm/040d61dad066eb2517c108232efb975bc1cda780.camel@xxxxxxxxxxxxx/ > 2. If we do split the logic across the two hooks, we should (a) do it in a separate > series and (b) make the logic common to the gfn_to_pfn cache and to the standard > kvm_unmap_gfn_range(). > > > > Yes, I think sooner or later we also want all pfn stuff in one file > > (together with MMU notifiers) and all hva stuff in another; so for now you > > can create virt/kvm/hva_to_pfn.h, or virt/kvm/mm.h, or whatever color of the > > bikeshed you prefer. > > Preemptive bikeshed strike... the MMU notifiers aren't strictly "pfn stuff", as > they operate on HVAs. I don't know exactly what Paolo has in mind, but kvm/mm.h > or kvm/kvm_mm.h seems like it's less likely to become stale in the future. I'd moved kvm/mmu_lock.h to kvm/kvm_mm.h and added to it. https://git.infradead.org/users/dwmw2/linux.git/commitdiff/a247bc2d0d9 (which I'll make retrospective as I rework the series). After frowning a little at all the different architectures' Makefiles that all add the same(ish) list of $(KVM)/foobar.o I ended up punting that problem by only adding pfncache.o on x86 anyway. If we're going to split other parts of kvm_main.c out into smaller files, providing a Makefile snippet in virt/kvm/Makefile.kvm that gives the *list* of those files would be a useful thing to do. But arch/powerpc/kvm/Makefile makes my head hurt too much for me to be shaving that particular yak tonight (why is $(KVM)/irqchip.o handled differently to the rest...?)
Attachment:
smime.p7s
Description: S/MIME cryptographic signature