Re: numa_alloc_onnode does not allocate on node passed as argument

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Andres

So more pieces if information.

My suspicion regarding influence of shared libraries and their access
pattern from numa nodes was confirmed by this article:
http://lwn.net/Articles/568870/
(I did not get into the code yet to understand it better)

So in case if we use MPOL_DEFAULT system makes decision on its own how
best to place memory and if memory should be migrated.
Also, mbind returns an error when using MPOL_DEFAULT and having
nonempty nodemask (as in your code without preferred node setting).
see man mbind on EINVAL.

So looks like its better to set the policy if you want to have some
predictable result as the default policy can have such results.
Though as recently was claimed, automatic numa placement is getting
closer to the performance in case of manual setting.

Andreas

Just few thoughts.
Maybe the success of you test is related to node workloads? available
memory? shared libraries placement?
Maybe you can show numa_maps for pid?
Non default system policy?
I wonder why in your test case mbind does not give an error - Invalid argument..


Thank you
Elena


On Fri, Oct 31, 2014 at 6:35 AM, Andreas Hollmann <hollmann@xxxxxxxxx> wrote:
> Have you build numactl yourself and tried to run the tests included?
>
> This is what I've tried on my machine and the test succeeded.
>
> wget ftp://oss.sgi.com/www/projects/libnuma/download/numactl-2.0.10.tar.gz
> tar xvf numactl-2.0.10.tar.gz
> cd numactl-2.0.10/
> ./configure
> make
> make test
>
> 2014-10-31 0:21 GMT+01:00 Andres Nötzli <noetzli@xxxxxxxxxxxx>:
>> Hi Elena,
>>
>> Thank you so much for looking into this issue. It is good to hear that you are getting the same strange result.
>>
>> I posted the output of /proc/pid/numa_maps here: https://gist.github.com/4tXJ7f/5e89f466e29cd1f7f1aa
>>
>> I hope this helps.
>>
>> Thanks again,
>> Andres
>>
>>> On 29 Oct 2014, at 21:33, Elena Ufimtseva <ufimtseva@xxxxxxxxx> wrote:
>>>
>>> Hello Andres
>>>
>>> I looked at the example you gave, had multiple variations running and
>>> have same strange results.
>>> The default local policy should be in use when there is no other policy defined.
>>> The only thing what comes to my mind its the shared library libnuma
>>> which has its data on different node then the node I try to run
>>> the test process on.
>>> Can you take a look and check what node is used by libnuma in
>>> /proc/pid/numa_maps?
>>>
>>> I will keep searching for an answer, its rather interesting topic.
>>> Or maybe someone else will give more details on this.
>>>
>>> Thank you!
>>>
>>>
>>> On Thu, Oct 23, 2014 at 1:17 PM, Andres Nötzli <noetzli@xxxxxxxxxxxx> wrote:
>>>> Hi Elena,
>>>>
>>>> That would be great! I created a gist with the kernel config (cat /boot/config-$(uname -r)): https://gist.github.com/4tXJ7f/408a562abe5d4f28656d
>>>>
>>>> Please let me know if you need anything else.
>>>>
>>>> Thank you very much,
>>>> Andres
>>>>
>>>>> On 23 Oct 2014, at 06:15, Elena Ufimtseva <ufimtseva@xxxxxxxxx> wrote:
>>>>>
>>>>> Hi Andres
>>>>>
>>>>> I will poke around this on the weekend on my NUMA machine.
>>>>> Can you also attach your kernel config please?
>>>>>
>>>>> Thank you.
>>>>>
>>>>> On Wed, Oct 22, 2014 at 12:40 PM, Andres Nötzli <noetzli@xxxxxxxxxxxx> wrote:
>>>>>> Hi Elena,
>>>>>>
>>>>>> Thank you very much for your quick reply! numa_set_strict(1) and numa_set_strict(0) both result in the wrong output. I did not change the default policy.
>>>>>>
>>>>>> numa_get_membind returns 1 for all nodes before and after numa_run_on_node.
>>>>>> numa_get_interleave_mask returns 0 for all nodes.
>>>>>> numa_get_run_node_mask is all 1s before and 0010 after numa_run_on_node.
>>>>>>
>>>>>> The machine config (the CPUs are all Intel(R) Xeon(R) CPU E5-4657L v2 @ 2.40GHz):
>>>>>>
>>>>>> $ numactl --hardware
>>>>>> available: 4 nodes (0-3)
>>>>>> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 48 49 50 51 52 53 54 55 56 57 58 59
>>>>>> node 0 size: 262093 MB
>>>>>> node 0 free: 966 MB
>>>>>> node 1 cpus: 12 13 14 15 16 17 18 19 20 21 22 23 60 61 62 63 64 65 66 67 68 69 70 71
>>>>>> node 1 size: 262144 MB
>>>>>> node 1 free: 82 MB
>>>>>> node 2 cpus: 24 25 26 27 28 29 30 31 32 33 34 35 72 73 74 75 76 77 78 79 80 81 82 83
>>>>>> node 2 size: 262144 MB
>>>>>> node 2 free: 102 MB
>>>>>> node 3 cpus: 36 37 38 39 40 41 42 43 44 45 46 47 84 85 86 87 88 89 90 91 92 93 94 95
>>>>>> node 3 size: 262144 MB
>>>>>> node 3 free: 113 MB
>>>>>> node distances:
>>>>>> node   0   1   2   3
>>>>>> 0:  10  20  30  20
>>>>>> 1:  20  10  20  30
>>>>>> 2:  30  20  10  20
>>>>>> 3:  20  30  20  10
>>>>>>
>>>>>> Thanks again,
>>>>>> Andres
>>>>>>
>>>>>>> On 22 Oct 2014, at 06:12, Elena Ufimtseva <ufimtseva@xxxxxxxxx> wrote:
>>>>>>>
>>>>>>> On Tue, Oct 21, 2014 at 11:47 PM, Andres Nötzli <noetzli@xxxxxxxxxxxx> wrote:
>>>>>>>> Hi everyone,
>>>>>>>>
>>>>>>>> I am experiencing a weird problem. When using numa_alloc_onnode repeatedly to allocate memory, it does not allocate memory on the node passed as an argument.
>>>>>>>>
>>>>>>>> Sample code:
>>>>>>>> #include <numa.h>
>>>>>>>> #include <numaif.h>
>>>>>>>> #include <iostream>
>>>>>>>> using namespace std;
>>>>>>>>
>>>>>>>> void find_memory_node_for_addr(void* ptr) {
>>>>>>>> int numa_node = -1;
>>>>>>>> if(get_mempolicy(&numa_node, NULL, 0, ptr, MPOL_F_NODE | MPOL_F_ADDR) < 0)
>>>>>>>>    cout << "WARNING: get_mempolicy failed" << endl;
>>>>>>>> cout << numa_node << endl;
>>>>>>>> }
>>>>>>>>
>>>>>>>> int main() {
>>>>>>>> int64_t* x;
>>>>>>>> int64_t n = 5000;
>>>>>>>> //numa_set_preferred(1);
>>>>>>>>
>>>>>>>> numa_run_on_node(2);
>>>>>>>> for(int i = 0; i < 20; i++) {
>>>>>>>>    size_t s = n * sizeof(int64_t);
>>>>>>>>    x = (int64_t*)numa_alloc_onnode(s, 1);
>>>>>>>>    for(int j = 0; j < n; j++)
>>>>>>>>       x[j] = j + i;
>>>>>>>>    find_memory_node_for_addr(x);
>>>>>>>> }
>>>>>>>>
>>>>>>>> return 0;
>>>>>>>> }
>>>>>>>>
>>>>>>>> Output:
>>>>>>>> 1
>>>>>>>> 1
>>>>>>>> 1
>>>>>>>> 2
>>>>>>>> 1
>>>>>>>> 2
>>>>>>>> 1
>>>>>>>> 2
>>>>>>>> 1
>>>>>>>> 2
>>>>>>>> 1
>>>>>>>> 2
>>>>>>>> 1
>>>>>>>> 2
>>>>>>>> 1
>>>>>>>> 2
>>>>>>>> 1
>>>>>>>> 2
>>>>>>>> 1
>>>>>>>> 2
>>>>>>>>
>>>>>>>> When uncommenting the line "numa_set_preferred(1);”, the output is all 1s as expected. Am I doing something wrong? Have you seen similar issues?
>>>>>>>>
>>>>>>>> I am running Ubuntu 12.04.5 LTS:
>>>>>>>> $ cat /proc/version
>>>>>>>> Linux version 3.2.0-29-generic (buildd@allspice) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #46-Ubuntu SMP Fri Jul 27 17:03:23 UTC 2012
>>>>>>>>
>>>>>>>> I am using libnuma 2.0.10 but I’ve had the same problem with 2.0.8~rc3-1.
>>>>>>>>
>>>>>>>> Thank you very much,
>>>>>>>> Andres
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-numa" in
>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>>> Hi Andres
>>>>>>>
>>>>>>> Can you try to use strict policy by calling numa_set_strict?
>>>>>>>
>>>>>>> If you comment out setting the preferred node, the default policy is
>>>>>>> in action (I assume you did no change it, not for the process, not
>>>>>>> system wide) which is preferred also.
>>>>>>> But here you set preferred to a specific node and manual says, the
>>>>>>> default for process is to allocate on the node it runs.
>>>>>>> So I wonder what is the cpu affinity for this process looks like...
>>>>>>> Also maybe just to confirm you can check the policy from within your
>>>>>>> running code?
>>>>>>>
>>>>>>> Can you also post the machine NUMA config?
>>>>>>>
>>>>>>> --
>>>>>>> Elena
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Elena
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Elena
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-numa" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Elena
--
To unsubscribe from this list: send the line "unsubscribe linux-numa" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]     [Devices]

  Powered by Linux