On Wed, Jul 5, 2023 at 8:56 PM Yu Zhang <yu.c.zhang@xxxxxxxxxxxxxxx> wrote: > > On Tue, Jul 04, 2023 at 04:50:48PM +0900, David Stevens wrote: > > From: David Stevens <stevensd@xxxxxxxxxxxx> > > > > Make it so that __kvm_follow_pfn does not imply FOLL_GET. This allows > > callers to resolve a gfn when the associated pfn has a valid struct page > > that isn't being actively refcounted (e.g. tail pages of non-compound > > higher order pages). For a caller to safely omit FOLL_GET, all usages of > > the returned pfn must be guarded by a mmu notifier. > > > > This also adds a is_refcounted_page out parameter to kvm_follow_pfn that > > is set when the returned pfn has an associated struct page with a valid > > refcount. Callers that don't pass FOLL_GET should remember this value > > and use it to avoid places like kvm_is_ad_tracked_page that assume a > > non-zero refcount. > > > > Signed-off-by: David Stevens <stevensd@xxxxxxxxxxxx> > > --- > > include/linux/kvm_host.h | 10 ++++++ > > virt/kvm/kvm_main.c | 67 +++++++++++++++++++++------------------- > > virt/kvm/pfncache.c | 2 +- > > 3 files changed, 47 insertions(+), 32 deletions(-) > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > index ef2763c2b12e..a45308c7d2d9 100644 > > --- a/include/linux/kvm_host.h > > +++ b/include/linux/kvm_host.h > > @@ -1157,6 +1157,9 @@ unsigned long gfn_to_hva_memslot_prot(struct kvm_memory_slot *slot, gfn_t gfn, > > void kvm_release_page_clean(struct page *page); > > void kvm_release_page_dirty(struct page *page); > > > > +void kvm_set_page_accessed(struct page *page); > > +void kvm_set_page_dirty(struct page *page); > > + > > struct kvm_follow_pfn { > > const struct kvm_memory_slot *slot; > > gfn_t gfn; > > @@ -1164,10 +1167,17 @@ struct kvm_follow_pfn { > > bool atomic; > > /* Allow a read fault to create a writeable mapping. */ > > bool allow_write_mapping; > > + /* > > + * Usage of the returned pfn will be guared by a mmu notifier. Must > > + * be true if FOLL_GET is not set. > > + */ > > + bool guarded_by_mmu_notifier; > > And how? Any place to check the invalidate seq? kvm_follow_pfn can't meaningfully validate the seq number, since the mmu notifier locking is handled by the caller. This is more of a sanity check that the API is being used properly, as proposed here [1]. I did deviate from the proposal with a bool instead of some type of integer, since the exact value of mmu_seq wouldn't be useful. [1] https://lore.kernel.org/all/ZGvUsf7lMkrNDHuE@xxxxxxxxxx/#t -David