On 21.07.22 11:44, David Hildenbrand wrote: > On 06.07.22 10:20, Chao Peng wrote: >> Normally, a write to unallocated space of a file or the hole of a sparse >> file automatically causes space allocation, for memfd, this equals to >> memory allocation. This new seal prevents such automatically allocating, >> either this is from a direct write() or a write on the previously >> mmap-ed area. The seal does not prevent fallocate() so an explicit >> fallocate() can still cause allocating and can be used to reserve >> memory. >> >> This is used to prevent unintentional allocation from userspace on a >> stray or careless write and any intentional allocation should use an >> explicit fallocate(). One of the main usecases is to avoid memory double >> allocation for confidential computing usage where we use two memfds to >> back guest memory and at a single point only one memfd is alive and we >> want to prevent memory allocation for the other memfd which may have >> been mmap-ed previously. More discussion can be found at: >> >> https://lkml.org/lkml/2022/6/14/1255 >> >> Suggested-by: Sean Christopherson <seanjc@xxxxxxxxxx> >> Signed-off-by: Chao Peng <chao.p.peng@xxxxxxxxxxxxxxx> >> --- >> include/uapi/linux/fcntl.h | 1 + >> mm/memfd.c | 3 ++- >> mm/shmem.c | 16 ++++++++++++++-- >> 3 files changed, 17 insertions(+), 3 deletions(-) >> >> diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h >> index 2f86b2ad6d7e..98bdabc8e309 100644 >> --- a/include/uapi/linux/fcntl.h >> +++ b/include/uapi/linux/fcntl.h >> @@ -43,6 +43,7 @@ >> #define F_SEAL_GROW 0x0004 /* prevent file from growing */ >> #define F_SEAL_WRITE 0x0008 /* prevent writes */ >> #define F_SEAL_FUTURE_WRITE 0x0010 /* prevent future writes while mapped */ >> +#define F_SEAL_AUTO_ALLOCATE 0x0020 /* prevent allocation for writes */ > > Why only "on writes" and not "on reads". IIRC, shmem doesn't support the > shared zeropage, so you'll simply allocate a new page via read() or on > read faults. Correction: on read() we don't allocate a fresh page. But on read faults we would. So this comment here needs clarification. > > > Also, I *think* you can place pages via userfaultfd into shmem. Not sure > if that would count "auto alloc", but it would certainly bypass fallocate(). > -- Thanks, David / dhildenb