Any comments on this? On Tue, Aug 18, 2020 at 05:15:48PM +0300, Mike Rapoport wrote: > From: Mike Rapoport <rppt@xxxxxxxxxxxxx> > > Hi, > > This is an implementation of "secret" mappings backed by a file descriptor. > > v4 changes: > * rebase on v5.9-rc1 > * Do not redefine PMD_PAGE_ORDER in fs/dax.c, thanks Kirill > * Make secret mappings exclusive by default and only require flags to > memfd_secret() system call for uncached mappings, thanks again Kirill :) > > v3 changes: > * Squash kernel-parameters.txt update into the commit that added the > command line option. > * Make uncached mode explicitly selectable by architectures. For now enable > it only on x86. > > v2 changes: > * Follow Michael's suggestion and name the new system call 'memfd_secret' > * Add kernel-parameters documentation about the boot option > * Fix i386-tinyconfig regression reported by the kbuild bot. > CONFIG_SECRETMEM now depends on !EMBEDDED to disable it on small systems > from one side and still make it available unconditionally on > architectures that support SET_DIRECT_MAP. > > > The file descriptor backing secret memory mappings is created using a > dedicated memfd_secret system call The desired protection mode for the > memory is configured using flags parameter of the system call. The mmap() > of the file descriptor created with memfd_secret() will create a "secret" > memory mapping. The pages in that mapping will be marked as not present in > the direct map and will have desired protection bits set in the user page > table. For instance, current implementation allows uncached mappings. > > Although normally Linux userspace mappings are protected from other users, > such secret mappings are useful for environments where a hostile tenant is > trying to trick the kernel into giving them access to other tenants > mappings. > > Additionally, the secret mappings may be used as a mean to protect guest > memory in a virtual machine host. > > For demonstration of secret memory usage we've created a userspace library > [1] that does two things: the first is act as a preloader for openssl to > redirect all the OPENSSL_malloc calls to secret memory meaning any secret > keys get automatically protected this way and the other thing it does is > expose the API to the user who needs it. We anticipate that a lot of the > use cases would be like the openssl one: many toolkits that deal with > secret keys already have special handling for the memory to try to give > them greater protection, so this would simply be pluggable into the > toolkits without any need for user application modification. > > I've hesitated whether to continue to use new flags to memfd_create() or to > add a new system call and I've decided to use a new system call after I've > started to look into man pages update. There would have been two completely > independent descriptions and I think it would have been very confusing. > > Hiding secret memory mappings behind an anonymous file allows (ab)use of > the page cache for tracking pages allocated for the "secret" mappings as > well as using address_space_operations for e.g. page migration callbacks. > > The anonymous file may be also used implicitly, like hugetlb files, to > implement mmap(MAP_SECRET) and use the secret memory areas with "native" mm > ABIs in the future. > > As the fragmentation of the direct map was one of the major concerns raised > during the previous postings, I've added an amortizing cache of PMD-size > pages to each file descriptor and an ability to reserve large chunks of the > physical memory at boot time and then use this memory as an allocation pool > for the secret memory areas. > > v3: https://lore.kernel.org/lkml/20200804095035.18778-1-rppt@xxxxxxxxxx > v2: https://lore.kernel.org/lkml/20200727162935.31714-1-rppt@xxxxxxxxxx > v1: https://lore.kernel.org/lkml/20200720092435.17469-1-rppt@xxxxxxxxxx/ > rfc-v2: https://lore.kernel.org/lkml/20200706172051.19465-1-rppt@xxxxxxxxxx/ > rfc-v1: https://lore.kernel.org/lkml/20200130162340.GA14232@rapoport-lnx/ > > Mike Rapoport (6): > mm: add definition of PMD_PAGE_ORDER > mmap: make mlock_future_check() global > mm: introduce memfd_secret system call to create "secret" memory areas > arch, mm: wire up memfd_secret system call were relevant > mm: secretmem: use PMD-size pages to amortize direct map fragmentation > mm: secretmem: add ability to reserve memory at boot > > arch/Kconfig | 7 + > arch/arm64/include/asm/unistd.h | 2 +- > arch/arm64/include/asm/unistd32.h | 2 + > arch/arm64/include/uapi/asm/unistd.h | 1 + > arch/riscv/include/asm/unistd.h | 1 + > arch/x86/Kconfig | 1 + > arch/x86/entry/syscalls/syscall_32.tbl | 1 + > arch/x86/entry/syscalls/syscall_64.tbl | 1 + > fs/dax.c | 11 +- > include/linux/pgtable.h | 3 + > include/linux/syscalls.h | 1 + > include/uapi/asm-generic/unistd.h | 7 +- > include/uapi/linux/magic.h | 1 + > include/uapi/linux/secretmem.h | 8 + > kernel/sys_ni.c | 2 + > mm/Kconfig | 4 + > mm/Makefile | 1 + > mm/internal.h | 3 + > mm/mmap.c | 5 +- > mm/secretmem.c | 451 +++++++++++++++++++++++++ > 20 files changed, 501 insertions(+), 12 deletions(-) > create mode 100644 include/uapi/linux/secretmem.h > create mode 100644 mm/secretmem.c > > -- > 2.26.2 > -- Sincerely yours, Mike.