On Thu, Apr 07, 2022, Andy Lutomirski wrote: > > On Thu, Apr 7, 2022, at 9:05 AM, Sean Christopherson wrote: > > On Thu, Mar 10, 2022, Chao Peng wrote: > >> Since page migration / swapping is not supported yet, MFD_INACCESSIBLE > >> memory behave like longterm pinned pages and thus should be accounted to > >> mm->pinned_vm and be restricted by RLIMIT_MEMLOCK. > >> > >> Signed-off-by: Chao Peng <chao.p.peng@xxxxxxxxxxxxxxx> > >> --- > >> mm/shmem.c | 25 ++++++++++++++++++++++++- > >> 1 file changed, 24 insertions(+), 1 deletion(-) > >> > >> diff --git a/mm/shmem.c b/mm/shmem.c > >> index 7b43e274c9a2..ae46fb96494b 100644 > >> --- a/mm/shmem.c > >> +++ b/mm/shmem.c > >> @@ -915,14 +915,17 @@ static void notify_fallocate(struct inode *inode, pgoff_t start, pgoff_t end) > >> static void notify_invalidate_page(struct inode *inode, struct folio *folio, > >> pgoff_t start, pgoff_t end) > >> { > >> -#ifdef CONFIG_MEMFILE_NOTIFIER > >> struct shmem_inode_info *info = SHMEM_I(inode); > >> > >> +#ifdef CONFIG_MEMFILE_NOTIFIER > >> start = max(start, folio->index); > >> end = min(end, folio->index + folio_nr_pages(folio)); > >> > >> memfile_notifier_invalidate(&info->memfile_notifiers, start, end); > >> #endif > >> + > >> + if (info->xflags & SHM_F_INACCESSIBLE) > >> + atomic64_sub(end - start, ¤t->mm->pinned_vm); > > > > As Vishal's to-be-posted selftest discovered, this is broken as current->mm > > may be NULL. Or it may be a completely different mm, e.g. AFAICT there's > > nothing that prevents a different process from punching hole in the shmem > > backing. > > > > How about just not charging the mm in the first place? There’s precedent: > ramfs and hugetlbfs (at least sometimes — I’ve lost track of the current > status). > > In any case, for an administrator to try to assemble the various rlimits into > a coherent policy is, and always has been, quite messy. ISTM cgroup limits, > which can actually add across processes usefully, are much better. > > So, aside from the fact that these fds aren’t in a filesystem and are thus > available by default, I’m not convinced that this accounting is useful or > necessary. > > Maybe we could just have some switch require to enable creation of private > memory in the first place, and anyone who flips that switch without > configuring cgroups is subject to DoS. I personally have no objection to that, and I'm 99% certain Google doesn't rely on RLIMIT_MEMLOCK.