On Wed, Jul 5, 2023 at 10:19 PM Zhi Wang <zhi.wang.linux@xxxxxxxxx> wrote: > > On Tue, 4 Jul 2023 16:50:48 +0900 > David Stevens <stevensd@xxxxxxxxxxxx> wrote: > > > From: David Stevens <stevensd@xxxxxxxxxxxx> > > > > Make it so that __kvm_follow_pfn does not imply FOLL_GET. This allows > > callers to resolve a gfn when the associated pfn has a valid struct page > > that isn't being actively refcounted (e.g. tail pages of non-compound > > higher order pages). For a caller to safely omit FOLL_GET, all usages of > > the returned pfn must be guarded by a mmu notifier. > > > > This also adds a is_refcounted_page out parameter to kvm_follow_pfn that > > is set when the returned pfn has an associated struct page with a valid > > refcount. Callers that don't pass FOLL_GET should remember this value > > and use it to avoid places like kvm_is_ad_tracked_page that assume a > > non-zero refcount. > > > > Signed-off-by: David Stevens <stevensd@xxxxxxxxxxxx> > > --- > > include/linux/kvm_host.h | 10 ++++++ > > virt/kvm/kvm_main.c | 67 +++++++++++++++++++++------------------- > > virt/kvm/pfncache.c | 2 +- > > 3 files changed, 47 insertions(+), 32 deletions(-) > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > index ef2763c2b12e..a45308c7d2d9 100644 > > --- a/include/linux/kvm_host.h > > +++ b/include/linux/kvm_host.h > > @@ -1157,6 +1157,9 @@ unsigned long gfn_to_hva_memslot_prot(struct kvm_memory_slot *slot, gfn_t gfn, > > void kvm_release_page_clean(struct page *page); > > void kvm_release_page_dirty(struct page *page); > > > > +void kvm_set_page_accessed(struct page *page); > > +void kvm_set_page_dirty(struct page *page); > > + > > struct kvm_follow_pfn { > > const struct kvm_memory_slot *slot; > > gfn_t gfn; > > @@ -1164,10 +1167,17 @@ struct kvm_follow_pfn { > > bool atomic; > > /* Allow a read fault to create a writeable mapping. */ > > bool allow_write_mapping; > > + /* > > + * Usage of the returned pfn will be guared by a mmu notifier. Must > ^guarded > > + * be true if FOLL_GET is not set. > > + */ > > + bool guarded_by_mmu_notifier; > > > It seems no one sets the guraded_by_mmu_notifier in this patch. Is > guarded_by_mmu_notifier always equal to !foll->FOLL_GET and set by the > caller of __kvm_follow_pfn()? Yes, this is the case. > If yes, do we have to use FOLL_GET to resolve GFN associated with a tail page? > It seems gup can tolerate gup_flags without FOLL_GET, but it is more like a > temporary solution. I don't think it is a good idea to play tricks with > a temporary solution, more like we are abusing the toleration. I'm not sure I understand what you're getting at. This series never calls gup without FOLL_GET. This series aims to provide kvm_follow_pfn as a unified API on top of gup+follow_pte. Since one of the major clients of this API uses an mmu notifier, it makes sense to support returning a pfn without taking a reference. And we indeed need to do that for certain types of memory. > Is a flag like guarded_by_mmu_notifier (perhaps a better name) enough to > indicate a tail page? What do you mean by to indicate a tail page? Do you mean to indicate that the returned pfn refers to non-refcounted page? That's specified by is_refcounted_page. -David