Hi Ackerley, On Mon, 17 Mar 2025 at 16:27, Ackerley Tng <ackerleytng@xxxxxxxxxx> wrote: > > Vlastimil Babka <vbabka@xxxxxxx> writes: > > > On 3/13/25 14:49, Ackerley Tng wrote: > >> Fuad Tabba <tabba@xxxxxxxxxx> writes: > >> > >>> In some architectures, KVM could be defined as a module. If there is a > >>> pending folio_put() while KVM is unloaded, the system could crash. By > >>> having a helper check for that and call the function only if it's > >>> available, we are able to handle that case more gracefully. > >>> > >>> Signed-off-by: Fuad Tabba <tabba@xxxxxxxxxx> > >>> > >>> --- > >>> > >>> This patch could be squashed with the previous one of the maintainers > >>> think it would be better. > >>> --- > >>> include/linux/kvm_host.h | 5 +---- > >>> mm/swap.c | 20 +++++++++++++++++++- > >>> virt/kvm/guest_memfd.c | 8 ++++++++ > >>> 3 files changed, 28 insertions(+), 5 deletions(-) > >>> > >>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > >>> index 7788e3625f6d..3ad0719bfc4f 100644 > >>> --- a/include/linux/kvm_host.h > >>> +++ b/include/linux/kvm_host.h > >>> @@ -2572,10 +2572,7 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu, > >>> #endif > >>> > >>> #ifdef CONFIG_KVM_GMEM_SHARED_MEM > >>> -static inline void kvm_gmem_handle_folio_put(struct folio *folio) > >>> -{ > >>> - WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress."); > >>> -} > >>> +void kvm_gmem_handle_folio_put(struct folio *folio); > >>> #endif > >>> > >>> #endif > >>> diff --git a/mm/swap.c b/mm/swap.c > >>> index 241880a46358..27dfd75536c8 100644 > >>> --- a/mm/swap.c > >>> +++ b/mm/swap.c > >>> @@ -98,6 +98,24 @@ static void page_cache_release(struct folio *folio) > >>> unlock_page_lruvec_irqrestore(lruvec, flags); > >>> } > >>> > >>> +#ifdef CONFIG_KVM_GMEM_SHARED_MEM > >>> +static void gmem_folio_put(struct folio *folio) > >>> +{ > >>> +#if IS_MODULE(CONFIG_KVM) > >>> + void (*fn)(struct folio *folio); > >>> + > >>> + fn = symbol_get(kvm_gmem_handle_folio_put); > >>> + if (WARN_ON_ONCE(!fn)) > >>> + return; > >>> + > >>> + fn(folio); > >>> + symbol_put(kvm_gmem_handle_folio_put); > >>> +#else > >>> + kvm_gmem_handle_folio_put(folio); > >>> +#endif > >>> +} > >>> +#endif > > > > Yeah, this is not great. The vfio code isn't setting a good example to follow :( > > > >> Sorry about the premature sending earlier! > >> > >> I was thinking about having a static function pointer in mm/swap.c that > >> will be filled in when KVM is loaded and cleared when KVM is unloaded. > >> > >> One benefit I see is that it'll avoid the lookup that symbol_get() does > >> on every folio_put(), but some other pinning on KVM would have to be > >> done to prevent KVM from being unloaded in the middle of > >> kvm_gmem_handle_folio_put() call. > > > > Isn't there some "natural" dependency between things such that at the point > > the KVM module is able to unload itself, no guest_memfd areas should be > > existing anymore at that point, and thus also not any pages that would use > > this callback should exist? In that case it would mean there's a memory leak > > if that happens so while we might be trying to avoid calling a function that > > was unleaded, we don't need to try has hard as symbol_get()/put() on every > > invocation, but a racy check would be good enough? > > Or would such a late folio_put() be legitimate to happen because some > > short-lived folio_get() from e.g. a pfn scanner could prolong the page's > > lifetime beyond the KVM module? I'd hope that since you want to make pages > > PGTY_guestmem only in certain points of their lifetime, then maybe this > > should not be possible to happen? > > > > IIUC the last refcount on a guest_memfd folio may not be held by > guest_memfd if the folios is already truncated from guest_memfd. The > inode could already be closed. If the inode is closed then the KVM is > free to be unloaded. > > This means that someone could hold on to the last refcount, unload KVM, > and then drop the last refcount and have the folio_put() call a > non-existent callback. > > If we first check that folio->mapping != NULL and then do > kvm_gmem_handle_folio_put(), then I think what you suggested would work, > since folio->mapping is only NULL when the folio has been disassociated > from the inode. > > gmem_folio_put() should probably end with > > if (folio_ref_count(folio) == 0) > __folio_put(folio) > > so that if kvm_gmem_handle_folio_put() is done with whatever it needs to > (e.g. complete the conversion) gmem_folio_put() will free the folio. Right, this is important. Into the next respin. Thanks, /fuad > >> Do you/anyone else see pros/cons either way? > >> > >>> + > >>> static void free_typed_folio(struct folio *folio) > >>> { > >>> switch (folio_get_type(folio)) { > >>> @@ -108,7 +126,7 @@ static void free_typed_folio(struct folio *folio) > >>> #endif > >>> #ifdef CONFIG_KVM_GMEM_SHARED_MEM > >>> case PGTY_guestmem: > >>> - kvm_gmem_handle_folio_put(folio); > >>> + gmem_folio_put(folio); > >>> return; > >>> #endif > >>> default: > >>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > >>> index b2aa6bf24d3a..5fc414becae5 100644 > >>> --- a/virt/kvm/guest_memfd.c > >>> +++ b/virt/kvm/guest_memfd.c > >>> @@ -13,6 +13,14 @@ struct kvm_gmem { > >>> struct list_head entry; > >>> }; > >>> > >>> +#ifdef CONFIG_KVM_GMEM_SHARED_MEM > >>> +void kvm_gmem_handle_folio_put(struct folio *folio) > >>> +{ > >>> + WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress."); > >>> +} > >>> +EXPORT_SYMBOL_GPL(kvm_gmem_handle_folio_put); > >>> +#endif /* CONFIG_KVM_GMEM_SHARED_MEM */ > >>> + > >>> /** > >>> * folio_file_pfn - like folio_file_page, but return a pfn. > >>> * @folio: The folio which contains this index.