On Thu, Jul 21, 2022 at 12:27:03PM +0200, Gupta, Pankaj wrote: > > > > Normally, a write to unallocated space of a file or the hole of a sparse > > > file automatically causes space allocation, for memfd, this equals to > > > memory allocation. This new seal prevents such automatically allocating, > > > either this is from a direct write() or a write on the previously > > > mmap-ed area. The seal does not prevent fallocate() so an explicit > > > fallocate() can still cause allocating and can be used to reserve > > > memory. > > > > > > This is used to prevent unintentional allocation from userspace on a > > > stray or careless write and any intentional allocation should use an > > > explicit fallocate(). One of the main usecases is to avoid memory double > > > allocation for confidential computing usage where we use two memfds to > > > back guest memory and at a single point only one memfd is alive and we > > > want to prevent memory allocation for the other memfd which may have > > > been mmap-ed previously. More discussion can be found at: > > > > > > https://lkml.org/lkml/2022/6/14/1255 > > > > > > Suggested-by: Sean Christopherson <seanjc@xxxxxxxxxx> > > > Signed-off-by: Chao Peng <chao.p.peng@xxxxxxxxxxxxxxx> > > > --- > > > include/uapi/linux/fcntl.h | 1 + > > > mm/memfd.c | 3 ++- > > > mm/shmem.c | 16 ++++++++++++++-- > > > 3 files changed, 17 insertions(+), 3 deletions(-) > > > > > > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h > > > index 2f86b2ad6d7e..98bdabc8e309 100644 > > > --- a/include/uapi/linux/fcntl.h > > > +++ b/include/uapi/linux/fcntl.h > > > @@ -43,6 +43,7 @@ > > > #define F_SEAL_GROW 0x0004 /* prevent file from growing */ > > > #define F_SEAL_WRITE 0x0008 /* prevent writes */ > > > #define F_SEAL_FUTURE_WRITE 0x0010 /* prevent future writes while mapped */ > > > +#define F_SEAL_AUTO_ALLOCATE 0x0020 /* prevent allocation for writes */ > > > > Why only "on writes" and not "on reads". IIRC, shmem doesn't support the > > shared zeropage, so you'll simply allocate a new page via read() or on > > read faults. > > > > > > Also, I *think* you can place pages via userfaultfd into shmem. Not sure > > if that would count "auto alloc", but it would certainly bypass fallocate(). > > I was also thinking this at the same time, but for different reason: > > "Want to populate private preboot memory with firmware payload", so was > thinking userfaulftd could be an option as direct writes are restricted? If that can be a side effect, I definitely glad to see it, though I'm still not clear how userfaultfd can be particularly helpful for that. Chao > > Thanks, > Pankaj > > > >