Re: [PATCH V2 0/6] VA to numa node information

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 11/26/2018 11:20 AM, Steven Sistare wrote:
On 11/9/2018 11:48 PM, Prakash Sangappa wrote:

Here is some data from pmap using move_pages() API  with optimization.
Following table compares time pmap takes to print address mapping of a
large process, with numa node information using move_pages() api vs pmap
using /proc numa_vamaps file.

Running pmap command on a process with 1.3 TB of address space, with
sparse mappings.

                       ~1.3 TB sparse      250G dense segment with hugepages.
move_pages              8.33s              3.14
optimized move_pages    6.29s              0.92
/proc numa_vamaps       0.08s              0.04

 
Second column is pmap time on a 250G address range of this process, which maps
hugepages(THP & hugetlb).
The data look compelling to me.  numa_vmap provides a much smoother user experience
for the analyst who is casting a wide net looking for the root of a performance issue.
Almost no waiting to see the data.

- Steve

What do others think? How to proceed on this?

Summarizing the discussion so far:

Usecase for getting VA(Virtual Address) to numa node information is
for performance analysis purpose. Investigating  performance issues
would  involve looking at where a process memory is allocated from
(which numa node). For the user analyzing the issue, an efficient way
to get this information will be useful when looking at application
processes having large address space.

The patch proposed  adding /proc/<pid>/numa_vamaps file for providing
VA to Numa node id mapping information of a process. This file provides
address range to numa node id info. Address range not having any pages
mapped will be indicated with '-' for numa node id. Sample file content
00400000-00410000 N1
00410000-0047f000 N0
00480000-00481000 -
00481000-004a0000 N0
..
Dave Hansen asked how would it scale, with respect reading this file from
a large process. Answer is, the file contents are generated using page
table walk, and copied to user buffer. The mmap_sem lock is drop and
re-acquired in the process of walking the page table and copying file
content. The kernel buffer size used determines how long the lock is held.
Which can be further improved to drop the lock and re-acquire after a
fixed number(512) of pages are walked.

Also, with support for seeking to a specific VA of the process from where
the VA to numa node information will be provided, the file offset is not
taken into consideration. This behavior is different and unlike reading a
normal file. Other /proc files(Ex /proc/<pid>/pagemap) also have certain
differences compared to reading a normal file.

Michal Hocko suggested that the currently available 'move_pages' API
could be used to collect the VA to numa node id information. However,
use of numa_vamaps /proc file will be more efficient then move_pages().
Steven Sistare Suggested optimizing move_pages(), for the case when
consecutive 4k page  addresses are passed in. I tried out this optimization
and above mentioned table shows  performance comparison of
move_pages() API vs 'numa_vamaps' /proc file. Specifically, in the case of
sparse mapping the optimization to move_pages() does not help. The
performance benefits seen with the /proc file will make a difference from
an usability point of view.

Andrew Morton had asked about the performance difference between
move_pages() API and use of 'numa_vamaps' /proc file, also the usecase
for getting VA to numa node id information. Hope above description
answers the questions.








[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux