Re: NUMA page allocation from next Node

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2010-10-29 at 12:52 -0700, Tim Pepper wrote:
> On Fri 29 Oct at 07:35:35 +0530 btharindu@xxxxxxxxx said:
> > Finally I could isolate the issue further.
> > I tried following kernels and hardware.
> > Issue is visible only with IBM + SLES 11.
> > 
> > 1. SLES 11 + IBM HW --> Issue is Visible
> > 2. SLES 11 + HP, Sun HW --> Issue is not Visible
> > 2. 2.6.32 Vanilla + Any HW --> Issue is not Visible
> > 3. 2.6.36 Vanilla + Any HW --> Issue is not Visible
> 
> It would be interesting to see the output of "numactl --hardware" for each
> of these scenarios.
> 

Also, if you could add "mminit_loglevel=2" to the boot command line, and
grep for 'zonelist general'.  The general zonelists for the Normal zones
will show the order of allocation for the two nodes.  On a 2 node [AMD]
platform, I see:

xxx(lts)dmesg | grep 'zonelist general'
mminit::zonelist general 0:DMA = 0:DMA 
mminit::zonelist general 0:DMA32 = 0:DMA32 0:DMA 
mminit::zonelist general 0:Normal = 0:Normal 0:DMA32 0:DMA 1:Normal 
mminit::zonelist general 1:Normal = 1:Normal 0:Normal 0:DMA32 0:DMA 

so, node 0 Normal zone allocates from 0:Normal first, as expected, and
than falls back via DMA32, DMA [both on node 0] eventually to node 1
Normal.  Node 1 starts locally and falls back to Node 0 Normal and,
finally, the DMA zones.

You can also try:

cat /proc/zoneinfo | egrep '^Node|^  pages|^ +present'

and maybe "watch" that [watch(1)] while you run your tests.

And, just to be sure, you could suspend your dd job [^Z] and take a look
at it's mempolicy and such via /proc/<pid>/status [Mems_allowed*] and
it's /proc/<pid>/numa_maps.   If you haven't changed anything you should
see  both nodes in Mems_allowed[_list] and all of the policies in the
numa_maps should show 'default'.  

Andi already mentioned zone_reclaim_mode.  You'll want that set to '0'
if you want allocations to overflow/fallback to off-node without
attempting direct reclaim first.  E.g., set vm.zone_reclaim_mode = 0 in
your /etc/sysctl.conf and reload via 'sysctl -p' if you want it to
stick.

Regards,
Lee

--
To unsubscribe from this list: send the line "unsubscribe linux-numa" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]     [Devices]

  Powered by Linux