Re: No system call to determine MAX_NUMNODES?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/13/19 3:25 PM, Florian Weimer wrote:
> * Vlastimil Babka:
> 
>> On 2/7/19 1:27 AM, Alexander Duyck wrote:
>>> On Wed, Feb 6, 2019 at 3:13 PM Ralph Campbell <rcampbell@xxxxxxxxxx> wrote:
>>>>
>>>> I was using the latest git://git.cmpxchg.org/linux-mmotm.git and noticed
>>>> a new issue compared to 5.0.0-rc5.
>>>>
>>>> It looks like there is no convenient way to query the kernel's value for
>>>> MAX_NUMNODES yet this is used in kernel_get_mempolicy() to validate the
>>>> 'maxnode' parameter to the GET_MEMPOLICY(2) system call.
>>>> Otherwise, EINVAL is returned.
>>>>
>>>> Searching the internet for get_mempolicy yields some references that
>>>> recommend reading /proc/<pid>/status and parsing the line "Mems_allowed:".
>>>>
>>>> Running "cat /proc/self/status | grep Mems_allowed:" I get:
>>>> With 5.0.0-rc5:
>>>> Mems_allowed:   00000000,00000001
>>>> With 5.0.0-rc5-mm1:
>>>> Mems_allowed:   1
>>>> (both kernels were config'ed with CONFIG_NODES_SHIFT=6)
>>>>
>>>> Clearly, there should be a better way to query MAX_NUMNODES like
>>>> sysconf(), sysctl(), or libnuma.
>>> 
>>> Really we shouldn't need to know that. That just tells us about how
>>> the kernel was built, it doesn't really provide any information about
>>> the layout of the system.
>>> 
>>>> I searched for the patch that changed /proc/self/status but didn't find it.
>>> 
>>> The patch you are looking for is located at:
>>> http://lkml.kernel.org/r/1545405631-6808-1-git-send-email-longman@xxxxxxxxxx
>>
>> Hmm looks like libnuma [1] uses that /proc/self/status parsing approach for
>> numa_num_possible_nodes() and it's also mentioned in man numa(3), and comment in
>> code mentions that libcpuset does that as well. I'm afraid we can't just break this.
> 
> Oh-oh.  This looks utterly broken to me in the face of process
> migration.

MAX_NUMNODES and thus the layout of /proc/self/status is a build-time constant
of the kernel, so it won't change after migration between VM's if that's what
you're asking. CRIU might be affected if restore is done on kernel with
different MAX_NUMNODES.

> Is this used for anything important?  Perhaps sizing data structures in
> user space?

libnuma seems to parse it only once and then remembering the result for
everything else, so there shouldn't be e.g. mismatch between buffer alloc and
writing to it.

> Thanks,
> Florian
> 




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux