Re: [RFC PATCH] NUMA tuning support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



ä 2011å05æ06æ 04:43, Bill Gray åé:

Thanks for the feedback Lee!

One reason to use "membind" instead of "preferred" is that one can
prefer only a single node. For large guests, you can specify multiple
nodes with "membind". I think "preferred" would be preferred if it
allowed multiple nodes.

- Bill

Hi, Bill

Will "preferred" be still useful even if it only support single node?

Regards
Osier


On 05/05/2011 10:33 AM, Lee Schermerhorn wrote:
On Thu, 2011-05-05 at 17:38 +0800, Osier Yang wrote:
Hi, All,

This is a simple implenmentation for NUMA tuning support based on binary
program 'numactl', currently only supports to bind memory to
specified nodes,
using option "--membind", perhaps it need to support more, but I'd like
send it early so that could make sure if the principle is correct.

Ideally, NUMA tuning support should be added in qemu-kvm first, such
as they could provide command options, then what we need to do in
libvirt
is just to pass the options to qemu-kvm, but unfortunately qemu-kvm
doesn't
support it yet, what we could do currently is only to use numactl,
it forks process, a bit expensive than qemu-kvm supports NUMA tuning
inside with libnuma, but it shouldn't affects much I guess.

The NUMA tuning XML is like:

<numatune>
<membind nodeset='+0-4,8-12'/>
</numatune>

Any thoughts/feedback is appreciated.

Osier:

A couple of thoughts/observations:

1) you can accomplish the same thing -- restricting a domain's memory to
a specified set of nodes -- using the cpuset cgroup that is already
associated with each domain. E.g.,

cgset -r cpuset.mems=<nodeset> /libvirt/qemu/<domain>

Or the equivalent libcgroup call.

However, numactl is more flexible; especially if you intend to support
more policies: preferred, interleave. Which leads to the question:

2) Do you really want the full "membind" semantics as opposed to
"preferred" by default? Membind policy will restrict the VMs pages to
the specified nodeset and will initiate reclaim/stealing and wait for
pages to become available or the task is OOM-killed because of mempolicy
when all of the nodes in nodeset reach their minimum watermark. Membind
works the same as cpuset.mems in this respect. Preferred policy will
keep memory allocations [but not vcpu execution] local to the specified
set of nodes as long as there is sufficient memory, and will silently
"overflow" allocations to other nodes when necessary. I.e., it's a
little more forgiving under memory pressure.

But then pinning a VM's vcpus to the physical cpus of a set of nodes and
retaining the default local allocation policy will have the same effect
as "preferred" while ensuring that the VM component tasks execute
locally to the memory footprint. Currently, I do this by looking up the
cpulist associated with the node[s] from e.g.,
/sys/devices/system/node/node<i>/cpulist and using that list with the
vcpu.cpuset attribute. Adding a 'nodeset' attribute to the
cputune.vcpupin element would simplify specifying that configuration.

Regards,
Lee



--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list



[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]