On Thu, Jul 21, 2022 at 03:05:09PM +0000, Sean Christopherson wrote: > On Thu, Jul 21, 2022, David Hildenbrand wrote: > > On 21.07.22 11:44, David Hildenbrand wrote: > > > On 06.07.22 10:20, Chao Peng wrote: > > >> Normally, a write to unallocated space of a file or the hole of a sparse > > >> file automatically causes space allocation, for memfd, this equals to > > >> memory allocation. This new seal prevents such automatically allocating, > > >> either this is from a direct write() or a write on the previously > > >> mmap-ed area. The seal does not prevent fallocate() so an explicit > > >> fallocate() can still cause allocating and can be used to reserve > > >> memory. > > >> > > >> This is used to prevent unintentional allocation from userspace on a > > >> stray or careless write and any intentional allocation should use an > > >> explicit fallocate(). One of the main usecases is to avoid memory double > > >> allocation for confidential computing usage where we use two memfds to > > >> back guest memory and at a single point only one memfd is alive and we > > >> want to prevent memory allocation for the other memfd which may have > > >> been mmap-ed previously. More discussion can be found at: > > >> > > >> https://lkml.org/lkml/2022/6/14/1255 > > >> > > >> Suggested-by: Sean Christopherson <seanjc@xxxxxxxxxx> > > >> Signed-off-by: Chao Peng <chao.p.peng@xxxxxxxxxxxxxxx> > > >> --- > > >> include/uapi/linux/fcntl.h | 1 + > > >> mm/memfd.c | 3 ++- > > >> mm/shmem.c | 16 ++++++++++++++-- > > >> 3 files changed, 17 insertions(+), 3 deletions(-) > > >> > > >> diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h > > >> index 2f86b2ad6d7e..98bdabc8e309 100644 > > >> --- a/include/uapi/linux/fcntl.h > > >> +++ b/include/uapi/linux/fcntl.h > > >> @@ -43,6 +43,7 @@ > > >> #define F_SEAL_GROW 0x0004 /* prevent file from growing */ > > >> #define F_SEAL_WRITE 0x0008 /* prevent writes */ > > >> #define F_SEAL_FUTURE_WRITE 0x0010 /* prevent future writes while mapped */ > > >> +#define F_SEAL_AUTO_ALLOCATE 0x0020 /* prevent allocation for writes */ > > > > > > Why only "on writes" and not "on reads". IIRC, shmem doesn't support the > > > shared zeropage, so you'll simply allocate a new page via read() or on > > > read faults. > > > > Correction: on read() we don't allocate a fresh page. But on read faults > > we would. So this comment here needs clarification. > > Not just the comment, the code too. The intent of F_SEAL_AUTO_ALLOCATE is very > much to block _all_ implicit allocations (or maybe just fault-based allocations > if "implicit" is too broad of a description). So maybe still your initial suggestion F_SEAL_FAULT_ALLOCATIONS? One reason I don't like it is the write() ioctl also cause allocation and we want to prevent it. Chao