On Thu, 2011-05-05 at 17:38 +0800, Osier Yang wrote: > Hi, All, > > This is a simple implenmentation for NUMA tuning support based on binary > program 'numactl', currently only supports to bind memory to specified nodes, > using option "--membind", perhaps it need to support more, but I'd like > send it early so that could make sure if the principle is correct. > > Ideally, NUMA tuning support should be added in qemu-kvm first, such > as they could provide command options, then what we need to do in libvirt > is just to pass the options to qemu-kvm, but unfortunately qemu-kvm doesn't > support it yet, what we could do currently is only to use numactl, > it forks process, a bit expensive than qemu-kvm supports NUMA tuning > inside with libnuma, but it shouldn't affects much I guess. > > The NUMA tuning XML is like: > > <numatune> > <membind nodeset='+0-4,8-12'/> > </numatune> > > Any thoughts/feedback is appreciated. Osier: A couple of thoughts/observations: 1) you can accomplish the same thing -- restricting a domain's memory to a specified set of nodes -- using the cpuset cgroup that is already associated with each domain. E.g., cgset -r cpuset.mems=<nodeset> /libvirt/qemu/<domain> Or the equivalent libcgroup call. However, numactl is more flexible; especially if you intend to support more policies: preferred, interleave. Which leads to the question: 2) Do you really want the full "membind" semantics as opposed to "preferred" by default? Membind policy will restrict the VMs pages to the specified nodeset and will initiate reclaim/stealing and wait for pages to become available or the task is OOM-killed because of mempolicy when all of the nodes in nodeset reach their minimum watermark. Membind works the same as cpuset.mems in this respect. Preferred policy will keep memory allocations [but not vcpu execution] local to the specified set of nodes as long as there is sufficient memory, and will silently "overflow" allocations to other nodes when necessary. I.e., it's a little more forgiving under memory pressure. But then pinning a VM's vcpus to the physical cpus of a set of nodes and retaining the default local allocation policy will have the same effect as "preferred" while ensuring that the VM component tasks execute locally to the memory footprint. Currently, I do this by looking up the cpulist associated with the node[s] from e.g., /sys/devices/system/node/node<i>/cpulist and using that list with the vcpu.cpuset attribute. Adding a 'nodeset' attribute to the cputune.vcpupin element would simplify specifying that configuration. Regards, Lee -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list