Re: [PATCH v6 03/10] KVM: guest_memfd: Handle kvm_gmem_handle_folio_put() for KVM as a module

Fuad Tabba <tabba@xxxxxxxxxx> · Mon, 17 Mar 2025 16:50:22 +0000

Hi Ackerley,

On Mon, 17 Mar 2025 at 16:27, Ackerley Tng <ackerleytng@xxxxxxxxxx> wrote:
>
> Vlastimil Babka <vbabka@xxxxxxx> writes:
>
> > On 3/13/25 14:49, Ackerley Tng wrote:
> >> Fuad Tabba <tabba@xxxxxxxxxx> writes:
> >>
> >>> In some architectures, KVM could be defined as a module. If there is a
> >>> pending folio_put() while KVM is unloaded, the system could crash. By
> >>> having a helper check for that and call the function only if it's
> >>> available, we are able to handle that case more gracefully.
> >>>
> >>> Signed-off-by: Fuad Tabba <tabba@xxxxxxxxxx>
> >>>
> >>> ---
> >>>
> >>> This patch could be squashed with the previous one of the maintainers
> >>> think it would be better.
> >>> ---
> >>>  include/linux/kvm_host.h |  5 +----
> >>>  mm/swap.c                | 20 +++++++++++++++++++-
> >>>  virt/kvm/guest_memfd.c   |  8 ++++++++
> >>>  3 files changed, 28 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> >>> index 7788e3625f6d..3ad0719bfc4f 100644
> >>> --- a/include/linux/kvm_host.h
> >>> +++ b/include/linux/kvm_host.h
> >>> @@ -2572,10 +2572,7 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
> >>>  #endif
> >>>
> >>>  #ifdef CONFIG_KVM_GMEM_SHARED_MEM
> >>> -static inline void kvm_gmem_handle_folio_put(struct folio *folio)
> >>> -{
> >>> -   WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress.");
> >>> -}
> >>> +void kvm_gmem_handle_folio_put(struct folio *folio);
> >>>  #endif
> >>>
> >>>  #endif
> >>> diff --git a/mm/swap.c b/mm/swap.c
> >>> index 241880a46358..27dfd75536c8 100644
> >>> --- a/mm/swap.c
> >>> +++ b/mm/swap.c
> >>> @@ -98,6 +98,24 @@ static void page_cache_release(struct folio *folio)
> >>>             unlock_page_lruvec_irqrestore(lruvec, flags);
> >>>  }
> >>>
> >>> +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
> >>> +static void gmem_folio_put(struct folio *folio)
> >>> +{
> >>> +#if IS_MODULE(CONFIG_KVM)
> >>> +   void (*fn)(struct folio *folio);
> >>> +
> >>> +   fn = symbol_get(kvm_gmem_handle_folio_put);
> >>> +   if (WARN_ON_ONCE(!fn))
> >>> +           return;
> >>> +
> >>> +   fn(folio);
> >>> +   symbol_put(kvm_gmem_handle_folio_put);
> >>> +#else
> >>> +   kvm_gmem_handle_folio_put(folio);
> >>> +#endif
> >>> +}
> >>> +#endif
> >
> > Yeah, this is not great. The vfio code isn't setting a good example to follow :(
> >
> >> Sorry about the premature sending earlier!
> >>
> >> I was thinking about having a static function pointer in mm/swap.c that
> >> will be filled in when KVM is loaded and cleared when KVM is unloaded.
> >>
> >> One benefit I see is that it'll avoid the lookup that symbol_get() does
> >> on every folio_put(), but some other pinning on KVM would have to be
> >> done to prevent KVM from being unloaded in the middle of
> >> kvm_gmem_handle_folio_put() call.
> >
> > Isn't there some "natural" dependency between things such that at the point
> > the KVM module is able to unload itself, no guest_memfd areas should be
> > existing anymore at that point, and thus also not any pages that would use
> > this callback should exist? In that case it would mean there's a memory leak
> > if that happens so while we might be trying to avoid calling a function that
> > was unleaded, we don't need to try has hard as symbol_get()/put() on every
> > invocation, but a racy check would be good enough?
> > Or would such a late folio_put() be legitimate to happen because some
> > short-lived folio_get() from e.g. a pfn scanner could prolong the page's
> > lifetime beyond the KVM module? I'd hope that since you want to make pages
> > PGTY_guestmem only in certain points of their lifetime, then maybe this
> > should not be possible to happen?
> >
>
> IIUC the last refcount on a guest_memfd folio may not be held by
> guest_memfd if the folios is already truncated from guest_memfd. The
> inode could already be closed. If the inode is closed then the KVM is
> free to be unloaded.
>
> This means that someone could hold on to the last refcount, unload KVM,
> and then drop the last refcount and have the folio_put() call a
> non-existent callback.
>
> If we first check that folio->mapping != NULL and then do
> kvm_gmem_handle_folio_put(), then I think what you suggested would work,
> since folio->mapping is only NULL when the folio has been disassociated
> from the inode.
>
> gmem_folio_put() should probably end with
>
> if (folio_ref_count(folio) == 0)
>         __folio_put(folio)
>
> so that if kvm_gmem_handle_folio_put() is done with whatever it needs to
> (e.g. complete the conversion) gmem_folio_put() will free the folio.

Right, this is important. Into the next respin.

Thanks,
/fuad

> >> Do you/anyone else see pros/cons either way?
> >>
> >>> +
> >>>  static void free_typed_folio(struct folio *folio)
> >>>  {
> >>>     switch (folio_get_type(folio)) {
> >>> @@ -108,7 +126,7 @@ static void free_typed_folio(struct folio *folio)
> >>>  #endif
> >>>  #ifdef CONFIG_KVM_GMEM_SHARED_MEM
> >>>     case PGTY_guestmem:
> >>> -           kvm_gmem_handle_folio_put(folio);
> >>> +           gmem_folio_put(folio);
> >>>             return;
> >>>  #endif
> >>>     default:
> >>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> >>> index b2aa6bf24d3a..5fc414becae5 100644
> >>> --- a/virt/kvm/guest_memfd.c
> >>> +++ b/virt/kvm/guest_memfd.c
> >>> @@ -13,6 +13,14 @@ struct kvm_gmem {
> >>>     struct list_head entry;
> >>>  };
> >>>
> >>> +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
> >>> +void kvm_gmem_handle_folio_put(struct folio *folio)
> >>> +{
> >>> +   WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress.");
> >>> +}
> >>> +EXPORT_SYMBOL_GPL(kvm_gmem_handle_folio_put);
> >>> +#endif /* CONFIG_KVM_GMEM_SHARED_MEM */
> >>> +
> >>>  /**
> >>>   * folio_file_pfn - like folio_file_page, but return a pfn.
> >>>   * @folio: The folio which contains this index.