Re: Questions about the numa support implementation of Linux

Andi Kleen <andi@xxxxxxxxxxxxxx> · Sun, 22 Nov 2009 16:41:41 +0100

On Sun, Nov 22, 2009 at 04:07:27AM +0200, Christos Margiolas wrote:
> Hello,
> Few weeks now, I'm working on a map reduce implementation for numa
> systems(cell be and amd64 arch) and I have searched the web and irc

Single socket cell is not really a classical cc/numa system, if you refer to 
the SPEs, because they don't have a common address space with the other CPUs
NUMA policy is only for systems with cache coherent common address
space.

> channels for information about numa support on Linux. I study about
> the libnuma v2.1 library and I wrote some test programs and I
> understood enough good the library api. But I still have enough
> questions and I think they're kernel support relative. I will try
> indite my questions as simple as possible in order to save your time.
> 
> I know that the operating system(Linux) is  running on a single node
> and it's not distributed over the nodes.

That's not correct; 64bit Linux uses all nodes. 32bit Linux has some
limitations, but it's normally not used in NUMA mode.

> 
> a)When a process or a process's thread is executing on a node(not the
> same with the kernel), the local memory(execution node of process or
> thread) has a copy of the text and data segments or there are always

Linux has no user text duplication in the standard kernel. There were
some experimental out of tree patches. Some non x86 ports do text duplication
for the kernel text.

In practice it can be emulated by running multiple copies of the
executables though.

> references to the memory of the node with the os where the data and
> text segments would legacy be? If the final is true, this is a big
> bottleneck.

It is not.

Remember that most NUMA systems today have large caches, fast
interconnects and relatively low NUMA factors (< 1:2). Also CPUs are 
quite good at prefetching code

A lot of the classical NUMA papers you might have read
were written for systems with much worse NUMA factors and slow
interconnects, and the wisdom in there does not necessarily 
apply.

> b)Memory allocation system calls which is used by standard c library
> api (malloc, calloc, realloc) are aware about numa policy on memory
> systems? Or only with the libnuma calls is possible to allocate memory
> effectively with respect on numa bind restrictions?

When malloc.et.al. allocate fresh memory they are bound by NUMA policies
for the thread. For reallocated/freed objects it's a bit more difficult
although as long as you don't malloc in one thread and free in others
it tends to work out too, because malloc has per thread pools.

There are some custom allocators which are NUMA aware.

> 
> c)If a strict memory  policy was specified, which designates specific
> nodes, and the nodes  are out of memory resources, the system will
> kill the process or it will will use the swap as it does for a normal
> uniform memory system?

It will swap or otherwise free memory.

> d)When a processor binding function(scheduler or numa api call)
> returns the schedulers has applied the requested policy or the changes
> will take effect after the context switch? I have the same question
> about memory policy appliance.

It takes effect immediately.

> 
> e)Also the shared libs running only on the kernel's node, right?

There's no kernel node.

> I will appreciate also any link or study material about numa on Linux.

http://halobates.de/numaapi3.pdf is a somewhat outdated introduction.

-Andi

-- 
ak@xxxxxxxxxxxxxxx -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-numa" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html