Re: [RFC PATCH 0/11] Rework gfn_to_pfn_cache

David Woodhouse <dwmw2@xxxxxxxxxxxxx> · Mon, 15 Nov 2021 23:22:20 +0000

On Mon, 2021-11-15 at 22:59 +0000, Sean Christopherson wrote:
> On Mon, Nov 15, 2021, Paolo Bonzini wrote:
> > On 11/15/21 20:11, David Woodhouse wrote:
> > > > Changing mn_memslots_update_rcuwait to a waitq (and renaming it to
> > > > mn_invalidate_waitq) is of course also a possibility.
> > > I suspect that's the answer.
> > > 
> > > I think the actual*invalidation*  of the cache still lives in the
> > > invalidate_range() callback where I have it at the moment.
> 
> Oooh!  [finally had a lightbulb moment about ->invalidate_range() after years of
> befuddlement].
> 
> Two things:
> 
>   1. Using _only_ ->invalidate_range() is not correct.  ->invalidate_range() is
>      required if and only if the old PFN needs to be _unmapped_.  Specifically,
>      if the protections are being downgraded without changing the PFN, it doesn't
>      need to be called.  E.g. from hugetlb_change_protection():

OK, that's kind of important to realise. Thanks.

So, I had just split the atomic and guest-mode invalidations apart:
https://git.infradead.org/users/dwmw2/linux.git/commitdiff/6cf5fe318fd
but will go back to doing it all in invalidate_range_start from a
single list.

And just deal with the fact that the atomic users now have to
loop/retry/wait for there *not* to be an MMU notification in progress.

>      I believe we could use ->invalidate_range() to handle the unmap case if KVM's
>      ->invalidate_range_start() hook is enhanced to handle the RW=>R case.  The
>      "struct mmu_notifier_range" provides the event type, IIUC we could have the
>      _start() variant handle MMU_NOTIFY_PROTECTION_{VMA,PAGE} (and maybe
>      MMU_NOTIFY_SOFT_DIRTY?), and let the more precise unmap-only variant handle
>      everything else.

Not sure that helps us much. It was the termination condition on the
"when should we keep retrying, and when should we give up?" that was
painful, and a mixed mode doesn't that problem it go away.

I'll go back and have another look in the morning, with something much
closer to what I showed in
https://lore.kernel.org/kvm/040d61dad066eb2517c108232efb975bc1cda780.camel@xxxxxxxxxxxxx/

>   2. If we do split the logic across the two hooks, we should (a) do it in a separate
>      series and (b) make the logic common to the gfn_to_pfn cache and to the standard
>      kvm_unmap_gfn_range(). 
> > 
> > Yes, I think sooner or later we also want all pfn stuff in one file
> > (together with MMU notifiers) and all hva stuff in another; so for now you
> > can create virt/kvm/hva_to_pfn.h, or virt/kvm/mm.h, or whatever color of the
> > bikeshed you prefer.
> 
> Preemptive bikeshed strike... the MMU notifiers aren't strictly "pfn stuff", as
> they operate on HVAs.  I don't know exactly what Paolo has in mind, but kvm/mm.h
> or kvm/kvm_mm.h seems like it's less likely to become stale in the future.

I'd moved kvm/mmu_lock.h to kvm/kvm_mm.h and added to it.
https://git.infradead.org/users/dwmw2/linux.git/commitdiff/a247bc2d0d9
(which I'll make retrospective as I rework the series).

After frowning a little at all the different architectures' Makefiles
that all add the same(ish) list of $(KVM)/foobar.o I ended up punting
that problem by only adding pfncache.o on x86 anyway.

If we're going to split other parts of kvm_main.c out into smaller
files, providing a Makefile snippet in virt/kvm/Makefile.kvm that gives
the *list* of those files would be a useful thing to do. But
arch/powerpc/kvm/Makefile makes my head hurt too much for me to be
shaving that particular yak tonight (why is $(KVM)/irqchip.o handled
differently to the rest...?)

Attachment:
smime.p7s

Description: S/MIME cryptographic signature