(cc Brendan's other email address, hoping for review input ;)) On Mon, 5 Aug 2019 13:04:47 -0400 "Joel Fernandes (Google)" <joel@xxxxxxxxxxxxxxxxx> wrote: > The page_idle tracking feature currently requires looking up the pagemap > for a process followed by interacting with /sys/kernel/mm/page_idle. > Looking up PFN from pagemap in Android devices is not supported by > unprivileged process and requires SYS_ADMIN and gives 0 for the PFN. > > This patch adds support to directly interact with page_idle tracking at > the PID level by introducing a /proc/<pid>/page_idle file. It follows > the exact same semantics as the global /sys/kernel/mm/page_idle, but now > looking up PFN through pagemap is not needed since the interface uses > virtual frame numbers, and at the same time also does not require > SYS_ADMIN. > > In Android, we are using this for the heap profiler (heapprofd) which > profiles and pin points code paths which allocates and leaves memory > idle for long periods of time. This method solves the security issue > with userspace learning the PFN, and while at it is also shown to yield > better results than the pagemap lookup, the theory being that the window > where the address space can change is reduced by eliminating the > intermediate pagemap look up stage. In virtual address indexing, the > process's mmap_sem is held for the duration of the access. Quite a lot of changes to the page_idle code. Has this all been runtime tested on architectures where CONFIG_HAVE_ARCH_PTE_SWP_PGIDLE=n? That could be x86 with a little Kconfig fiddle-for-testing-purposes. > 8 files changed, 376 insertions(+), 45 deletions(-) Quite a lot of new code unconditionally added to major architectures. Are we confident that everyone will want this feature? > > ... > > +static int proc_page_idle_open(struct inode *inode, struct file *file) > +{ > + struct mm_struct *mm; > + > + mm = proc_mem_open(inode, PTRACE_MODE_READ); > + if (IS_ERR(mm)) > + return PTR_ERR(mm); > + file->private_data = mm; > + return 0; > +} > + > +static int proc_page_idle_release(struct inode *inode, struct file *file) > +{ > + struct mm_struct *mm = file->private_data; > + > + if (mm) I suspect the test isn't needed? proc_page_idle_release) won't be called if proc_page_idle_open() failed? > + mmdrop(mm); > + return 0; > +} > > ... >