On Tue, Sep 01, 2020 at 09:44:59PM +0530, Sumit Semwal wrote: > From: Colin Cross <ccross@xxxxxxxxxx> > > In many userspace applications, and especially in VM based applications > like Android uses heavily, there are multiple different allocators in use. > At a minimum there is libc malloc and the stack, and in many cases there > are libc malloc, the stack, direct syscalls to mmap anonymous memory, and > multiple VM heaps (one for small objects, one for big objects, etc.). > Each of these layers usually has its own tools to inspect its usage; > malloc by compiling a debug version, the VM through heap inspection tools, > and for direct syscalls there is usually no way to track them. > > On Android we heavily use a set of tools that use an extended version of > the logic covered in Documentation/vm/pagemap.txt to walk all pages mapped > in userspace and slice their usage by process, shared (COW) vs. unique > mappings, backing, etc. This can account for real physical memory usage > even in cases like fork without exec (which Android uses heavily to share > as many private COW pages as possible between processes), Kernel SamePage > Merging, and clean zero pages. It produces a measurement of the pages > that only exist in that process (USS, for unique), and a measurement of > the physical memory usage of that process with the cost of shared pages > being evenly split between processes that share them (PSS). > > If all anonymous memory is indistinguishable then figuring out the real > physical memory usage (PSS) of each heap requires either a pagemap walking > tool that can understand the heap debugging of every layer, or for every > layer's heap debugging tools to implement the pagemap walking logic, in > which case it is hard to get a consistent view of memory across the whole > system. > > Tracking the information in userspace leads to all sorts of problems. > It either needs to be stored inside the process, which means every > process has to have an API to export its current heap information upon > request, or it has to be stored externally in a filesystem that > somebody needs to clean up on crashes. It needs to be readable while > the process is still running, so it has to have some sort of > synchronization with every layer of userspace. Efficiently tracking > the ranges requires reimplementing something like the kernel vma > trees, and linking to it from every layer of userspace. It requires > more memory, more syscalls, more runtime cost, and more complexity to > separately track regions that the kernel is already tracking. > > This patch adds a field to /proc/pid/maps and /proc/pid/smaps to show a > userspace-provided name for anonymous vmas. The names of named anonymous > vmas are shown in /proc/pid/maps and /proc/pid/smaps as [anon:<name>]. Hm. I guess that there might be tools that expect the field to be empty for anonymous memory, no? > Userspace can set the name for a region of memory by calling > prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name); > Setting the name to NULL clears it. > > The name is stored in a user pointer in the shared union in vm_area_struct > that points to a null terminated string inside the user process. vmas > that point to the same address and are otherwise mergeable will be merged, > but vmas that point to equivalent strings at different addresses will not > be merged. > > The idea to store a userspace pointer to reduce the complexity within mm > (at the expense of the complexity of reading /proc/pid/mem) came from Dave > Hansen. This results in no runtime overhead in the mm subsystem other > than comparing the anon_name pointers when considering vma merging. The > pointer is stored in a union with fields that are only used on file-backed > mappings, so it does not increase memory usage. > (Upstream changed to remove the union, so this patch adds it back as well) IIUC, it gives userspace direct control of content of /proc/$PID/maps and /proc/$PID/smaps. There's no verification of the given string whatsoever. I'm sure security experts would find clever usage of the feature :P -- Kirill A. Shutemov