Re: [PATCH v6 03/10] KVM: guest_memfd: Handle kvm_gmem_handle_folio_put() for KVM as a module

Ackerley Tng <ackerleytng@xxxxxxxxxx> · Mon, 17 Mar 2025 16:27:52 +0000

Vlastimil Babka <vbabka@xxxxxxx> writes:

> On 3/13/25 14:49, Ackerley Tng wrote:
>> Fuad Tabba <tabba@xxxxxxxxxx> writes:
>> 
>>> In some architectures, KVM could be defined as a module. If there is a
>>> pending folio_put() while KVM is unloaded, the system could crash. By
>>> having a helper check for that and call the function only if it's
>>> available, we are able to handle that case more gracefully.
>>>
>>> Signed-off-by: Fuad Tabba <tabba@xxxxxxxxxx>
>>>
>>> ---
>>>
>>> This patch could be squashed with the previous one of the maintainers
>>> think it would be better.
>>> ---
>>>  include/linux/kvm_host.h |  5 +----
>>>  mm/swap.c                | 20 +++++++++++++++++++-
>>>  virt/kvm/guest_memfd.c   |  8 ++++++++
>>>  3 files changed, 28 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>>> index 7788e3625f6d..3ad0719bfc4f 100644
>>> --- a/include/linux/kvm_host.h
>>> +++ b/include/linux/kvm_host.h
>>> @@ -2572,10 +2572,7 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
>>>  #endif
>>>  
>>>  #ifdef CONFIG_KVM_GMEM_SHARED_MEM
>>> -static inline void kvm_gmem_handle_folio_put(struct folio *folio)
>>> -{
>>> -	WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress.");
>>> -}
>>> +void kvm_gmem_handle_folio_put(struct folio *folio);
>>>  #endif
>>>  
>>>  #endif
>>> diff --git a/mm/swap.c b/mm/swap.c
>>> index 241880a46358..27dfd75536c8 100644
>>> --- a/mm/swap.c
>>> +++ b/mm/swap.c
>>> @@ -98,6 +98,24 @@ static void page_cache_release(struct folio *folio)
>>>  		unlock_page_lruvec_irqrestore(lruvec, flags);
>>>  }
>>>  
>>> +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
>>> +static void gmem_folio_put(struct folio *folio)
>>> +{
>>> +#if IS_MODULE(CONFIG_KVM)
>>> +	void (*fn)(struct folio *folio);
>>> +
>>> +	fn = symbol_get(kvm_gmem_handle_folio_put);
>>> +	if (WARN_ON_ONCE(!fn))
>>> +		return;
>>> +
>>> +	fn(folio);
>>> +	symbol_put(kvm_gmem_handle_folio_put);
>>> +#else
>>> +	kvm_gmem_handle_folio_put(folio);
>>> +#endif
>>> +}
>>> +#endif
>
> Yeah, this is not great. The vfio code isn't setting a good example to follow :(
>
>> Sorry about the premature sending earlier!
>> 
>> I was thinking about having a static function pointer in mm/swap.c that
>> will be filled in when KVM is loaded and cleared when KVM is unloaded.
>> 
>> One benefit I see is that it'll avoid the lookup that symbol_get() does
>> on every folio_put(), but some other pinning on KVM would have to be
>> done to prevent KVM from being unloaded in the middle of
>> kvm_gmem_handle_folio_put() call.
>
> Isn't there some "natural" dependency between things such that at the point
> the KVM module is able to unload itself, no guest_memfd areas should be
> existing anymore at that point, and thus also not any pages that would use
> this callback should exist? In that case it would mean there's a memory leak
> if that happens so while we might be trying to avoid calling a function that
> was unleaded, we don't need to try has hard as symbol_get()/put() on every
> invocation, but a racy check would be good enough?
> Or would such a late folio_put() be legitimate to happen because some
> short-lived folio_get() from e.g. a pfn scanner could prolong the page's
> lifetime beyond the KVM module? I'd hope that since you want to make pages
> PGTY_guestmem only in certain points of their lifetime, then maybe this
> should not be possible to happen?
>

IIUC the last refcount on a guest_memfd folio may not be held by
guest_memfd if the folios is already truncated from guest_memfd. The
inode could already be closed. If the inode is closed then the KVM is
free to be unloaded.

This means that someone could hold on to the last refcount, unload KVM,
and then drop the last refcount and have the folio_put() call a
non-existent callback.

If we first check that folio->mapping != NULL and then do
kvm_gmem_handle_folio_put(), then I think what you suggested would work,
since folio->mapping is only NULL when the folio has been disassociated
from the inode.

gmem_folio_put() should probably end with

if (folio_ref_count(folio) == 0)
	__folio_put(folio)

so that if kvm_gmem_handle_folio_put() is done with whatever it needs to
(e.g. complete the conversion) gmem_folio_put() will free the folio.

>> Do you/anyone else see pros/cons either way?
>> 
>>> +
>>>  static void free_typed_folio(struct folio *folio)
>>>  {
>>>  	switch (folio_get_type(folio)) {
>>> @@ -108,7 +126,7 @@ static void free_typed_folio(struct folio *folio)
>>>  #endif
>>>  #ifdef CONFIG_KVM_GMEM_SHARED_MEM
>>>  	case PGTY_guestmem:
>>> -		kvm_gmem_handle_folio_put(folio);
>>> +		gmem_folio_put(folio);
>>>  		return;
>>>  #endif
>>>  	default:
>>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
>>> index b2aa6bf24d3a..5fc414becae5 100644
>>> --- a/virt/kvm/guest_memfd.c
>>> +++ b/virt/kvm/guest_memfd.c
>>> @@ -13,6 +13,14 @@ struct kvm_gmem {
>>>  	struct list_head entry;
>>>  };
>>>  
>>> +#ifdef CONFIG_KVM_GMEM_SHARED_MEM
>>> +void kvm_gmem_handle_folio_put(struct folio *folio)
>>> +{
>>> +	WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress.");
>>> +}
>>> +EXPORT_SYMBOL_GPL(kvm_gmem_handle_folio_put);
>>> +#endif /* CONFIG_KVM_GMEM_SHARED_MEM */
>>> +
>>>  /**
>>>   * folio_file_pfn - like folio_file_page, but return a pfn.
>>>   * @folio: The folio which contains this index.