+jeffv from Android On Tue, Jan 26, 2021 at 11:51 PM Kalesh Singh <kaleshsingh@xxxxxxxxxx> wrote: > In order to measure how much memory a process actually consumes, it is > necessary to include the DMA buffer sizes for that process in the memory > accounting. Since the handle to DMA buffers are raw FDs, it is important > to be able to identify which processes have FD references to a DMA buffer. Or you could try to let the DMA buffer take a reference on the mm_struct and account its size into the mm_struct? That would probably be nicer to work with than having to poke around in procfs separately for DMA buffers. > Currently, DMA buffer FDs can be accounted using /proc/<pid>/fd/* and > /proc/<pid>/fdinfo -- both of which are only root readable, as follows: That's not quite right. They can both also be accessed by the user owning the process. Also, fdinfo is a standard interface for inspecting process state that doesn't permit reading process memory or manipulating process state - so I think it would be fine to permit access to fdinfo under a PTRACE_MODE_READ_FSCRED check, just like the interface you're suggesting. > 1. Do a readlink on each FD. > 2. If the target path begins with "/dmabuf", then the FD is a dmabuf FD. > 3. stat the file to get the dmabuf inode number. > 4. Read/ proc/<pid>/fdinfo/<fd>, to get the DMA buffer size. > > Android captures per-process system memory state when certain low memory > events (e.g a foreground app kill) occur, to identify potential memory > hoggers. To include a process’s dmabuf usage as part of its memory state, > the data collection needs to be fast enough to reflect the memory state at > the time of such events. > > Since reading /proc/<pid>/fd/ and /proc/<pid>/fdinfo/ requires root > privileges, this approach is not suitable for production builds. It should be easy to add enough information to /proc/<pid>/fdinfo/ so that you don't need to look at /proc/<pid>/fd/ anymore. > Granting > root privileges even to a system process increases the attack surface and > is highly undesirable. Additionally this is slow as it requires many > context switches for searching and getting the dma-buf info. What do you mean by "context switches"? Task switches or kernel/user transitions (e.g. via syscall)? > With the addition of per-buffer dmabuf stats in sysfs [1], the DMA buffer > details can be queried using their unique inode numbers. > > This patch proposes adding a /proc/<pid>/task/<tid>/dmabuf_fds interface. > > /proc/<pid>/task/<tid>/dmabuf_fds contains a list of inode numbers for > every DMA buffer FD that the task has. Entries with the same inode > number can appear more than once, indicating the total FD references > for the associated DMA buffer. > > If a thread shares the same files as the group leader then its > dmabuf_fds file will be empty, as these dmabufs are reported by the > group leader. > > The interface requires PTRACE_MODE_READ_FSCRED (same as /proc/<pid>/maps) > and allows the efficient accounting of per-process DMA buffer usage without > requiring root privileges. (See data below) I'm not convinced that introducing a new procfs file for this is the right way to go. And the idea of having to poke into multiple different files in procfs and in sysfs just to be able to compute a proper memory usage score for a process seems weird to me. "How much memory is this process using" seems like the kind of question the kernel ought to be able to answer (and the kernel needs to be able to answer somewhat accurately so that its own OOM killer can do its job properly)?