On Fri, Nov 19, 2021 at 02:51:11PM +0100, David Hildenbrand wrote: > On 19.11.21 14:47, Chao Peng wrote: > > From: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> > > > > The new seal type provides semantics required for KVM guest private > > memory support. A file descriptor with the seal set is going to be used > > as source of guest memory in confidential computing environments such as > > Intel TDX and AMD SEV. > > > > F_SEAL_GUEST can only be set on empty memfd. After the seal is set > > userspace cannot read, write or mmap the memfd. > > > > Userspace is in charge of guest memory lifecycle: it can allocate the > > memory with falloc or punch hole to free memory from the guest. > > > > The file descriptor passed down to KVM as guest memory backend. KVM > > register itself as the owner of the memfd via memfd_register_guest(). > > > > KVM provides callback that needed to be called on fallocate and punch > > hole. > > > > memfd_register_guest() returns callbacks that need be used for > > requesting a new page from memfd. > > > > Repeating the feedback I already shared in a private mail thread: > > > As long as page migration / swapping is not supported, these pages > behave like any longterm pinned pages (e.g., VFIO) or secretmem pages. > > 1. These pages are not MOVABLE. They must not end up on ZONE_MOVABLE or > MIGRATE_CMA. > > That should be easy to handle, you have to adjust the gfp_mask to > mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER); > just as mm/secretmem.c:secretmem_file_create() does. Okay, fair enough. mapping_set_unevictable() also makes sesne. > 2. These pages behave like mlocked pages and should be accounted as such. > > This is probably where the accounting "fun" starts, but maybe it's > easier than I think to handle. > > See mm/secretmem.c:secretmem_mmap(), where we account the pages as > VM_LOCKED and will consequently check per-process mlock limits. As we > don't mmap(), the same approach cannot be reused. > > See drivers/vfio/vfio_iommu_type1.c:vfio_pin_map_dma() and > vfio_pin_pages_remote() on how to manually account via mm->locked_vm . > > But it's a bit hairy because these pages are not actually mapped into > the page tables of the MM, so it might need some thought. Similarly, > these pages actually behave like "pinned" (as in mm->pinned_vm), but we > just don't increase the refcount AFAIR. Again, accounting really is a > bit hairy ... Accounting is fun indeed. Non-mapped mlocked memory is going to be confusing. Hm... I will look closer. -- Kirill A. Shutemov