Re: [RFC PATCH] Add /proc/<pid>/numa_vamaps for numa node information

"prakash.sangappa" <prakash.sangappa@xxxxxxxxxx> · Mon, 7 May 2018 16:22:15 -0700

On 05/03/2018 03:26 PM, Dave Hansen wrote:
On 05/03/2018 03:27 PM, prakash.sangappa wrote:
If each consecutive page comes from different node, yes in
the extreme case is this file will have a lot of lines. All the lines
are generated at the time file is read. The amount of data read will be
limited to the user read buffer size used in the read.

/proc/<pid>/pagemap also has kind of  similar issue. There is 1 64
bit value for each user page.
But nobody reads it sequentially.  Everybody lseek()s because it has a
fixed block size.  You can't do that in text.

The current text based files  on /proc does allow seeking, but it will not
help to seek to a specific VA(vma) to start from, as the seek offset 
will be the
offset in the text. This is the case with using 'seq_file' interface in the
kernel to generate the /proc file content.

However, with the proposed new file, we could allow seeking to specified
virtual address. The lseek offset in this case would represent the 
virtual address
of the process. Subsequent read from the file would provide VA range to 
numa node
information starting from that VA. In case the VA seek'ed to is invalid, 
it will start
from the next valid mapped VA of the process. The implementation would
not be based on seq_file.

For example.
Getting numa node information for a process having the following VMAs 
mapped,
starting from '006dc000'

00400000-004dd000
006dc000-006dd000
006dd000-006e6000

Can  seek to VA 006dc000 and start reading, it would get following

006dc000-006dd000 N1=1 kernelpagesize_kB=4 anon=1 dirty=1 
file=/usr/bin/bash
006dd000-006de000 N0=1 kernelpagesize_kB=4 anon=1 dirty=1 
file=/usr/bin/bash
006de000-006e0000 N1=2 kernelpagesize_kB=4 anon=2 dirty=2 
file=/usr/bin/bash
006e0000-006e6000 N0=6 kernelpagesize_kB=4 anon=6 dirty=6 
file=/usr/bin/bash
..

One advantage with getting numa node information from this /proc file vs 
say
using 'move_pages()' API, will be that the /proc file will be able to 
provide address
range to numa node information, not one page at a time.