On 06.01.22 14:06, Chao Peng wrote: > On Tue, Jan 04, 2022 at 03:22:07PM +0100, David Hildenbrand wrote: >> On 23.12.21 13:29, Chao Peng wrote: >>> From: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> >>> >>> Introduce a new seal F_SEAL_INACCESSIBLE indicating the content of >>> the file is inaccessible from userspace in any possible ways like >>> read(),write() or mmap() etc. >>> >>> It provides semantics required for KVM guest private memory support >>> that a file descriptor with this seal set is going to be used as the >>> source of guest memory in confidential computing environments such >>> as Intel TDX/AMD SEV but may not be accessible from host userspace. >>> >>> At this time only shmem implements this seal. >>> >>> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> >>> Signed-off-by: Chao Peng <chao.p.peng@xxxxxxxxxxxxxxx> >>> --- >>> include/uapi/linux/fcntl.h | 1 + >>> mm/shmem.c | 37 +++++++++++++++++++++++++++++++++++-- >>> 2 files changed, 36 insertions(+), 2 deletions(-) >>> >>> diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h >>> index 2f86b2ad6d7e..e2bad051936f 100644 >>> --- a/include/uapi/linux/fcntl.h >>> +++ b/include/uapi/linux/fcntl.h >>> @@ -43,6 +43,7 @@ >>> #define F_SEAL_GROW 0x0004 /* prevent file from growing */ >>> #define F_SEAL_WRITE 0x0008 /* prevent writes */ >>> #define F_SEAL_FUTURE_WRITE 0x0010 /* prevent future writes while mapped */ >>> +#define F_SEAL_INACCESSIBLE 0x0020 /* prevent file from accessing */ >> >> I think this needs more clarification: the file content can still be >> accessed using in-kernel mechanisms such as MEMFD_OPS for KVM. It >> effectively disallows traditional access to a file (read/write/mmap) >> that will result in ordinary MMU access to file content. >> >> Not sure how to best clarify that: maybe, prevent ordinary MMU access >> (e.g., read/write/mmap) to file content? > > Or: prevent userspace access (e.g., read/write/mmap) to file content? The issue with that phrasing is that userspace will be able to access that content, just via a different mechanism eventually ... e.g., via the KVM MMU indirectly. If that makes it clearer what I mean :) >> >>> /* (1U << 31) is reserved for signed error codes */ >>> >>> /* >>> diff --git a/mm/shmem.c b/mm/shmem.c >>> index 18f93c2d68f1..faa7e9b1b9bc 100644 >>> --- a/mm/shmem.c >>> +++ b/mm/shmem.c >>> @@ -1098,6 +1098,10 @@ static int shmem_setattr(struct user_namespace *mnt_userns, >>> (newsize > oldsize && (info->seals & F_SEAL_GROW))) >>> return -EPERM; >>> >>> + if ((info->seals & F_SEAL_INACCESSIBLE) && >>> + (newsize & ~PAGE_MASK)) >>> + return -EINVAL; >>> + >> >> What happens when sealing and there are existing mmaps? > > I think this is similar to ftruncate, in either case we just allow that. > The existing mmaps will be unmapped and KVM will be notified to > invalidate the mapping in the secondary MMU as well. This assume we > trust the userspace even though it can not access the file content. Can't we simply check+forbid instead? -- Thanks, David / dhildenb