On 4/25/24 7:52 AM, Paolo Bonzini wrote: > On 4/4/24 20:50, Paolo Bonzini wrote: >> KVM would like to add a ioctl to encrypt and install a page into private >> memory (i.e. into a guest_memfd), in preparation for launching an >> encrypted guest. >> >> This API should be used only once per page (unless there are failures), >> so we want to rule out the possibility of operating on a page that is >> already in the guest_memfd's filemap. Overwriting the page is almost >> certainly a sign of a bug, so we might as well forbid it. >> >> Therefore, introduce a new flag for __filemap_get_folio (to be passed >> together with FGP_CREAT) that allows *adding* a new page to the filemap >> but not returning an existing one. >> >> An alternative possibility would be to force KVM users to initialize >> the whole filemap in one go, but that is complicated by the fact that >> the filemap includes pages of different kinds, including some that are >> per-vCPU rather than per-VM. Basically the result would be closer to >> a system call that multiplexes multiple ioctls, than to something >> cleaner like readv/writev. >> >> Races between callers that pass FGP_CREAT_ONLY are uninteresting to >> the filemap code: one of the racers wins and one fails with EEXIST, >> similar to calling open(2) with O_CREAT|O_EXCL. It doesn't matter to >> filemap.c if the missing synchronization is in the kernel or in userspace, >> and in fact it could even be intentional. (In the case of KVM it turns >> out that a mutex is taken around these calls for unrelated reasons, >> so there can be no races.) >> >> Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> >> Cc: Yosry Ahmed <yosryahmed@xxxxxxxxxx> >> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx> > > Matthew, are your objections still valid or could I have your ack? So per the sub-thread on PATCH 09/11, IIUC this is now moot, right? Vlastimil > Thanks, > > Paolo > >> --- >> include/linux/pagemap.h | 2 ++ >> mm/filemap.c | 4 ++++ >> 2 files changed, 6 insertions(+) >> >> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h >> index f879c1d54da7..a8c0685e8c08 100644 >> --- a/include/linux/pagemap.h >> +++ b/include/linux/pagemap.h >> @@ -587,6 +587,7 @@ pgoff_t page_cache_prev_miss(struct address_space *mapping, >> * * %FGP_CREAT - If no folio is present then a new folio is allocated, >> * added to the page cache and the VM's LRU list. The folio is >> * returned locked. >> + * * %FGP_CREAT_ONLY - Fail if a folio is present >> * * %FGP_FOR_MMAP - The caller wants to do its own locking dance if the >> * folio is already in cache. If the folio was allocated, unlock it >> * before returning so the caller can do the same dance. >> @@ -607,6 +608,7 @@ typedef unsigned int __bitwise fgf_t; >> #define FGP_NOWAIT ((__force fgf_t)0x00000020) >> #define FGP_FOR_MMAP ((__force fgf_t)0x00000040) >> #define FGP_STABLE ((__force fgf_t)0x00000080) >> +#define FGP_CREAT_ONLY ((__force fgf_t)0x00000100) >> #define FGF_GET_ORDER(fgf) (((__force unsigned)fgf) >> 26) /* top 6 bits */ >> >> #define FGP_WRITEBEGIN (FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE) >> diff --git a/mm/filemap.c b/mm/filemap.c >> index 7437b2bd75c1..e7440e189ebd 100644 >> --- a/mm/filemap.c >> +++ b/mm/filemap.c >> @@ -1863,6 +1863,10 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index, >> folio = NULL; >> if (!folio) >> goto no_page; >> + if (fgp_flags & FGP_CREAT_ONLY) { >> + folio_put(folio); >> + return ERR_PTR(-EEXIST); >> + } >> >> if (fgp_flags & FGP_LOCK) { >> if (fgp_flags & FGP_NOWAIT) { >