Fuad Tabba <tabba@xxxxxxxxxx> writes: > Hi Ackerley, > > On Fri, 7 Mar 2025 at 17:04, Ackerley Tng <ackerleytng@xxxxxxxxxx> wrote: >> >> Fuad Tabba <tabba@xxxxxxxxxx> writes: >> >> > Before transitioning a guest_memfd folio to unshared, thereby >> > disallowing access by the host and allowing the hypervisor to >> > transition its view of the guest page as private, we need to be >> > sure that the host doesn't have any references to the folio. >> > >> > This patch introduces a new type for guest_memfd folios, which >> > isn't activated in this series but is here as a placeholder and >> > to facilitate the code in the subsequent patch series. This will >> > be used in the future to register a callback that informs the >> > guest_memfd subsystem when the last reference is dropped, >> > therefore knowing that the host doesn't have any remaining >> > references. >> > >> > This patch also introduces the configuration option, >> > KVM_GMEM_SHARED_MEM, which toggles support for mapping >> > guest_memfd shared memory at the host. >> > >> > Signed-off-by: Fuad Tabba <tabba@xxxxxxxxxx> >> > Acked-by: Vlastimil Babka <vbabka@xxxxxxx> >> > Acked-by: David Hildenbrand <david@xxxxxxxxxx> >> > --- >> > include/linux/kvm_host.h | 7 +++++++ >> > include/linux/page-flags.h | 16 ++++++++++++++++ >> > mm/debug.c | 1 + >> > mm/swap.c | 9 +++++++++ >> > virt/kvm/Kconfig | 5 +++++ >> > 5 files changed, 38 insertions(+) >> > >> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h >> > index f34f4cfaa513..7788e3625f6d 100644 >> > --- a/include/linux/kvm_host.h >> > +++ b/include/linux/kvm_host.h >> > @@ -2571,4 +2571,11 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu, >> > struct kvm_pre_fault_memory *range); >> > #endif >> > >> > +#ifdef CONFIG_KVM_GMEM_SHARED_MEM >> > +static inline void kvm_gmem_handle_folio_put(struct folio *folio) >> > +{ >> > + WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress."); >> > +} >> > +#endif >> > + >> > #endif >> >> Following up with the discussion at the guest_memfd biweekly call on the >> guestmem library, I think this folio_put() handler for guest_memfd could >> be the first function that's refactored out into (placeholder name) >> mm/guestmem.c. >> >> This folio_put() handler has to stay in memory even after KVM (as a >> module) is unloaded from memory, and so it is a good candidate for the >> first function in the guestmem library. >> >> Along those lines, CONFIG_KVM_GMEM_SHARED_MEM in this patch can be >> renamed CONFIG_GUESTMEM, and CONFIG_GUESTMEM will guard the existence of >> PGTY_guestmem. >> >> CONFIG_KVM_GMEM_SHARED_MEM can be introduced in the next patch of this >> series, which could, in Kconfig, select CONFIG_GUESTMEM. >> >> > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h >> > index 6dc2494bd002..daeee9a38e4c 100644 >> > --- a/include/linux/page-flags.h >> > +++ b/include/linux/page-flags.h >> > @@ -933,6 +933,7 @@ enum pagetype { >> > PGTY_slab = 0xf5, >> > PGTY_zsmalloc = 0xf6, >> > PGTY_unaccepted = 0xf7, >> > + PGTY_guestmem = 0xf8, >> > >> > PGTY_mapcount_underflow = 0xff >> > }; >> > @@ -1082,6 +1083,21 @@ FOLIO_TYPE_OPS(hugetlb, hugetlb) >> > FOLIO_TEST_FLAG_FALSE(hugetlb) >> > #endif >> > >> > +/* >> > + * guestmem folios are used to back VM memory as managed by guest_memfd. Once >> > + * the last reference is put, instead of freeing these folios back to the page >> > + * allocator, they are returned to guest_memfd. >> > + * >> > + * For now, guestmem will only be set on these folios as long as they cannot be >> > + * mapped to user space ("private state"), with the plan of always setting that >> > + * type once typed folios can be mapped to user space cleanly. >> > + */ >> > +#ifdef CONFIG_KVM_GMEM_SHARED_MEM >> > +FOLIO_TYPE_OPS(guestmem, guestmem) >> > +#else >> > +FOLIO_TEST_FLAG_FALSE(guestmem) >> > +#endif >> > + >> > PAGE_TYPE_OPS(Zsmalloc, zsmalloc, zsmalloc) >> > >> > /* >> > diff --git a/mm/debug.c b/mm/debug.c >> > index 8d2acf432385..08bc42c6cba8 100644 >> > --- a/mm/debug.c >> > +++ b/mm/debug.c >> > @@ -56,6 +56,7 @@ static const char *page_type_names[] = { >> > DEF_PAGETYPE_NAME(table), >> > DEF_PAGETYPE_NAME(buddy), >> > DEF_PAGETYPE_NAME(unaccepted), >> > + DEF_PAGETYPE_NAME(guestmem), >> > }; >> > >> > static const char *page_type_name(unsigned int page_type) >> > diff --git a/mm/swap.c b/mm/swap.c >> > index 47bc1bb919cc..241880a46358 100644 >> > --- a/mm/swap.c >> > +++ b/mm/swap.c >> > @@ -38,6 +38,10 @@ >> > #include <linux/local_lock.h> >> > #include <linux/buffer_head.h> >> > >> > +#ifdef CONFIG_KVM_GMEM_SHARED_MEM >> > +#include <linux/kvm_host.h> >> > +#endif >> > + >> > #include "internal.h" >> > >> > #define CREATE_TRACE_POINTS >> > @@ -101,6 +105,11 @@ static void free_typed_folio(struct folio *folio) >> > case PGTY_hugetlb: >> > free_huge_folio(folio); >> > return; >> > +#endif >> > +#ifdef CONFIG_KVM_GMEM_SHARED_MEM >> > + case PGTY_guestmem: >> > + kvm_gmem_handle_folio_put(folio); >> > + return; >> > #endif >> > default: >> > WARN_ON_ONCE(1); >> > diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig >> > index 54e959e7d68f..37f7734cb10f 100644 >> > --- a/virt/kvm/Kconfig >> > +++ b/virt/kvm/Kconfig >> > @@ -124,3 +124,8 @@ config HAVE_KVM_ARCH_GMEM_PREPARE >> > config HAVE_KVM_ARCH_GMEM_INVALIDATE >> > bool >> > depends on KVM_PRIVATE_MEM >> > + >> > +config KVM_GMEM_SHARED_MEM >> > + select KVM_PRIVATE_MEM >> > + depends on !KVM_GENERIC_MEMORY_ATTRIBUTES >> >> Enforcing that KVM_GENERIC_MEMORY_ATTRIBUTES is not selected should not >> be a strict requirement. Fuad explained in an offline chat that this is >> just temporary. >> >> If we have CONFIG_GUESTMEM, then this question is moot, I think >> CONFIG_GUESTMEM would just be independent of everything else; other >> configs would depend on CONFIG_GUESTMEM. > > There are two things here. First of all, the unfortunate naming > situation where PRIVATE could mean GUESTMEM, or private could mean not > shared. I plan to tackle this aspect (i.e., the naming) in a separate > patch series, since that will surely generate a lot of debate :) > Oops. By "depend on CONFIG_GUESTMEM" I meant "depend on the introduction of the guestmem shim". I think this is a good time to introduce the shim because the folio_put() callback needs to be in mm and not just in KVM, which is a loadable module and hence can be removed from memory. If we do introduce the shim, the config flag CONFIG_KVM_GMEM_SHARED_MEM will be replaced by CONFIG_GUESTMEM (or other name), and then the question on depending on !KVM_GENERIC_MEMORY_ATTRIBUTES will be moot since I think an mm config flag wouldn't place a constraint on a module config flag? When I wrote this, I thought that config flags are easily renamed since they're an interface and are user-facing, but I realized config flag renaming seems to be easily renamed based on this search [1]. If we're going with renaming in a separate patch series, some mechanism should be introduced here to handle the case where 1. Kernel (and KVM module) is compiled with KVM_GMEM_SHARED_MEM set 2. KVM is unloaded 3. folio_put() tries to call kvm_gmem_handle_folio_put() > The other part is that, with shared memory in-place, the memory > attributes are an orthogonal matter. The attributes are the userpace's > view of what it expects the state of the memory to be, and are used to > multiplex whether the memory being accessed is guest_memfd or the > regular (i.e., most likely anonymous) memory used normally by KVM. > > This behavior however would be architecture, or even vm-type specific. > I agree it is orthogonal but I'm calling this out because "depends on !KVM_GENERIC_MEMORY_ATTRIBUTES" means if I set CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES, I can't use PGTY_guestmem since CONFIG_KVM_GMEM_SHARED_MEM would get unset. I was trying to test this with a KVM_X86_SW_PROTECTED_VM, setting up for using the ioctl to convert memory and hit this issue. > Cheers, > /fuad > >> > + bool [1] https://lore.kernel.org/all/?q=s%3Arename+dfn%3AKconfig+merged