On 11/26/2018 11:20 AM, Steven Sistare
wrote:
On 11/9/2018 11:48 PM, Prakash Sangappa wrote: What do others think? How to proceed on this? Summarizing the discussion so far: Usecase for getting VA(Virtual Address) to numa node information is for performance analysis purpose. Investigating performance issues would involve looking at where a process memory is allocated from (which numa node). For the user analyzing the issue, an efficient way to get this information will be useful when looking at application processes having large address space. The patch proposed adding /proc/<pid>/numa_vamaps file for providing VA to Numa node id mapping information of a process. This file provides address range to numa node id info. Address range not having any pages mapped will be indicated with '-' for numa node id. Sample file content 00400000-00410000 N1 00410000-0047f000 N0 00480000-00481000 - 00481000-004a0000 N0 ..Dave Hansen asked how would it scale, with respect reading this file from a large process. Answer is, the file contents are generated using page table walk, and copied to user buffer. The mmap_sem lock is drop and re-acquired in the process of walking the page table and copying file content. The kernel buffer size used determines how long the lock is held. Which can be further improved to drop the lock and re-acquire after a fixed number(512) of pages are walked. Also, with support for seeking to a specific VA of the process from where the VA to numa node information will be provided, the file offset is not taken into consideration. This behavior is different and unlike reading a normal file. Other /proc files(Ex /proc/<pid>/pagemap) also have certain differences compared to reading a normal file. Michal Hocko suggested that the currently available 'move_pages' API could be used to collect the VA to numa node id information. However, use of numa_vamaps /proc file will be more efficient then move_pages(). Steven Sistare Suggested optimizing move_pages(), for the case when consecutive 4k page addresses are passed in. I tried out this optimization and above mentioned table shows performance comparison of move_pages() API vs 'numa_vamaps' /proc file. Specifically, in the case of sparse mapping the optimization to move_pages() does not help. The performance benefits seen with the /proc file will make a difference from an usability point of view. Andrew Morton had asked about the performance difference between move_pages() API and use of 'numa_vamaps' /proc file, also the usecase for getting VA to numa node id information. Hope above description answers the questions. |