On Tue, Jul 23, 2019 at 01:10:05PM +0300, Konstantin Khlebnikov wrote: > On 23.07.2019 11:43, Konstantin Khlebnikov wrote: > > On 23.07.2019 0:32, Joel Fernandes (Google) wrote: > > > The page_idle tracking feature currently requires looking up the pagemap > > > for a process followed by interacting with /sys/kernel/mm/page_idle. > > > This is quite cumbersome and can be error-prone too. If between > > > accessing the per-PID pagemap and the global page_idle bitmap, if > > > something changes with the page then the information is not accurate. > > > More over looking up PFN from pagemap in Android devices is not > > > supported by unprivileged process and requires SYS_ADMIN and gives 0 for > > > the PFN. > > > > > > This patch adds support to directly interact with page_idle tracking at > > > the PID level by introducing a /proc/<pid>/page_idle file. This > > > eliminates the need for userspace to calculate the mapping of the page. > > > It follows the exact same semantics as the global > > > /sys/kernel/mm/page_idle, however it is easier to use for some usecases > > > where looking up PFN is not needed and also does not require SYS_ADMIN. > > > It ended up simplifying userspace code, solving the security issue > > > mentioned and works quite well. SELinux does not need to be turned off > > > since no pagemap look up is needed. > > > > > > In Android, we are using this for the heap profiler (heapprofd) which > > > profiles and pin points code paths which allocates and leaves memory > > > idle for long periods of time. > > > > > > Documentation material: > > > The idle page tracking API for virtual address indexing using virtual page > > > frame numbers (VFN) is located at /proc/<pid>/page_idle. It is a bitmap > > > that follows the same semantics as /sys/kernel/mm/page_idle/bitmap > > > except that it uses virtual instead of physical frame numbers. > > > > > > This idle page tracking API can be simpler to use than physical address > > > indexing, since the pagemap for a process does not need to be looked up > > > to mark or read a page's idle bit. It is also more accurate than > > > physical address indexing since in physical address indexing, address > > > space changes can occur between reading the pagemap and reading the > > > bitmap. In virtual address indexing, the process's mmap_sem is held for > > > the duration of the access. > > > > Maybe integrate this into existing interface: /proc/pid/clear_refs and > > /proc/pid/pagemap ? > > > > I.e. echo X > /proc/pid/clear_refs clears reference bits in ptes and > > marks pages idle only for pages mapped in this process. > > And idle bit in /proc/pid/pagemap tells that page is still idle in this process. > > This is faster - we don't need to walk whole rmap for that. > > Moreover, this is so cheap so could be counted and shown in smaps. > Unlike to clearing real access bits this does not disrupt memory reclaimer. > Killer feature. I replied to your patch: https://lore.kernel.org/lkml/20190723134647.GA104199@xxxxxxxxxx/T/#med8992e75c32d9c47f95b119d24a43ded36420bc