On 19.08.20 13:42, Mike Rapoport wrote: > On Wed, Aug 19, 2020 at 12:47:54PM +0200, David Hildenbrand wrote: >> On 18.08.20 16:15, Mike Rapoport wrote: >>> From: Mike Rapoport <rppt@xxxxxxxxxxxxx> >>> >>> Hi, >>> >>> This is an implementation of "secret" mappings backed by a file descriptor. >>> >>> v4 changes: >>> * rebase on v5.9-rc1 >>> * Do not redefine PMD_PAGE_ORDER in fs/dax.c, thanks Kirill >>> * Make secret mappings exclusive by default and only require flags to >>> memfd_secret() system call for uncached mappings, thanks again Kirill :) >>> >>> v3 changes: >>> * Squash kernel-parameters.txt update into the commit that added the >>> command line option. >>> * Make uncached mode explicitly selectable by architectures. For now enable >>> it only on x86. >>> >>> v2 changes: >>> * Follow Michael's suggestion and name the new system call 'memfd_secret' >>> * Add kernel-parameters documentation about the boot option >>> * Fix i386-tinyconfig regression reported by the kbuild bot. >>> CONFIG_SECRETMEM now depends on !EMBEDDED to disable it on small systems >>> from one side and still make it available unconditionally on >>> architectures that support SET_DIRECT_MAP. >>> >>> >>> The file descriptor backing secret memory mappings is created using a >>> dedicated memfd_secret system call The desired protection mode for the >>> memory is configured using flags parameter of the system call. The mmap() >>> of the file descriptor created with memfd_secret() will create a "secret" >>> memory mapping. The pages in that mapping will be marked as not present in >>> the direct map and will have desired protection bits set in the user page >>> table. For instance, current implementation allows uncached mappings. >>> >>> Although normally Linux userspace mappings are protected from other users, >>> such secret mappings are useful for environments where a hostile tenant is >>> trying to trick the kernel into giving them access to other tenants >>> mappings. >>> >>> Additionally, the secret mappings may be used as a mean to protect guest >>> memory in a virtual machine host. >>> >> >> Just a general question. I assume such pages (where the direct mapping >> was changed) cannot get migrated - I can spot a simple alloc_page(). So >> essentially a process can just allocate a whole bunch of memory that is >> unmovable, correct? Is there any limit? Is it properly accounted towards >> the process (memctl) ? > > The memory as accounted in the same way like with mlock(), so normal > user won't be able to allocate more than RLIMIT_MEMLOCK. Okay, thanks. AFAIU the difference to mlock() is that the pages here are not movable, fragment memory, and limit compaction. Hm. -- Thanks, David / dhildenb