Elliot Berman <quic_eberman@xxxxxxxxxxx> writes: > In preparation for adding more features to KVM's guest_memfd, refactor > and introduce a library which abstracts some of the core-mm decisions > about managing folios associated with the file. The goal of the refactor > serves two purposes: > > Provide an easier way to reason about memory in guest_memfd. With KVM > supporting multiple confidentiality models (TDX, SEV-SNP, pKVM, ARM > CCA), and coming support for allowing kernel and userspace to access > this memory, it seems necessary to create a stronger abstraction between > core-mm concerns and hypervisor concerns. > > Provide a common implementation for other hypervisors (Gunyah) to use. > > Signed-off-by: Elliot Berman <quic_eberman@xxxxxxxxxxx> > --- > include/linux/guest_memfd.h | 44 +++++++ > mm/Kconfig | 3 + > mm/Makefile | 1 + > mm/guest_memfd.c | 285 ++++++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 333 insertions(+) > > diff --git a/include/linux/guest_memfd.h b/include/linux/guest_memfd.h > new file mode 100644 > index 000000000000..be56d9d53067 > --- /dev/null > +++ b/include/linux/guest_memfd.h > @@ -0,0 +1,44 @@ > +/* SPDX-License-Identifier: GPL-2.0-only */ > +/* > + * Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. > + */ > + > +#ifndef _LINUX_GUEST_MEMFD_H > +#define _LINUX_GUEST_MEMFD_H > + > +#include <linux/fs.h> > + > +/** > + * struct guest_memfd_operations - ops provided by owner to manage folios > + * @invalidate_begin: called when folios should be unmapped from guest. > + * May fail if folios couldn't be unmapped from guest. > + * Required. > + * @invalidate_end: called after invalidate_begin returns success. Optional. > + * @prepare: called before a folio is mapped into the guest address space. > + * Optional. > + * @release: Called when releasing the guest_memfd file. Required. > + */ > +struct guest_memfd_operations { > + int (*invalidate_begin)(struct inode *inode, pgoff_t offset, unsigned long nr); > + void (*invalidate_end)(struct inode *inode, pgoff_t offset, unsigned long nr); > + int (*prepare)(struct inode *inode, pgoff_t offset, struct folio *folio); > + int (*release)(struct inode *inode); > +}; > + > +/** > + * @GUEST_MEMFD_GRAB_UPTODATE: Ensure pages are zeroed/up to date. > + * If trusted hyp will do it, can ommit this flag > + * @GUEST_MEMFD_PREPARE: Call the ->prepare() op, if present. > + */ > +enum { > + GUEST_MEMFD_GRAB_UPTODATE = BIT(0), > + GUEST_MEMFD_PREPARE = BIT(1), > +}; I interpreted the current state of the code after patch [1] to mean that the definition of the uptodate flag means "prepared for guest use", so the two enum values here are probably actually the same thing. For SEV, this means calling rmp_make_private(), so I guess when the page allowed to be faulted in to userspace, rmp_make_shared() would have to be called on the page. Shall we continue to have the uptodate flag mean "prepared for guest use" (whether prepared for shared or private use)? Then we can have another enum to request a zeroed page (which will have no accompanying page flag)? Or could we remove the zeroing feature, since it was meant to be handled by trusted hypervisor or hardware in the first place? It was listed as a TODO before being removed in [2]. I like the idea of having flags to control what is done to the page, perhaps removing the page from the direct map could be yet another enum. > + > +struct folio *guest_memfd_grab_folio(struct file *file, pgoff_t index, u32 flags); > +struct file *guest_memfd_alloc(const char *name, > + const struct guest_memfd_operations *ops, > + loff_t size, unsigned long flags); > +bool is_guest_memfd(struct file *file, const struct guest_memfd_operations *ops); > + > +#endif > <snip> [1] https://lore.kernel.org/all/20240726185157.72821-15-pbonzini@xxxxxxxxxx/ [2] https://lore.kernel.org/all/20240726185157.72821-8-pbonzini@xxxxxxxxxx/