On Thu, May 05, 2011 at 10:33:46AM -0400, Lee Schermerhorn wrote: > On Thu, 2011-05-05 at 17:38 +0800, Osier Yang wrote: > > Hi, All, > > > > This is a simple implenmentation for NUMA tuning support based on binary > > program 'numactl', currently only supports to bind memory to specified nodes, > > using option "--membind", perhaps it need to support more, but I'd like > > send it early so that could make sure if the principle is correct. > > > > Ideally, NUMA tuning support should be added in qemu-kvm first, such > > as they could provide command options, then what we need to do in libvirt > > is just to pass the options to qemu-kvm, but unfortunately qemu-kvm doesn't > > support it yet, what we could do currently is only to use numactl, > > it forks process, a bit expensive than qemu-kvm supports NUMA tuning > > inside with libnuma, but it shouldn't affects much I guess. > > > > The NUMA tuning XML is like: > > > > <numatune> > > <membind nodeset='+0-4,8-12'/> > > </numatune> > > > > Any thoughts/feedback is appreciated. > > Osier: > > A couple of thoughts/observations: > > 1) you can accomplish the same thing -- restricting a domain's memory to > a specified set of nodes -- using the cpuset cgroup that is already > associated with each domain. E.g., > > cgset -r cpuset.mems=<nodeset> /libvirt/qemu/<domain> > > Or the equivalent libcgroup call. > > However, numactl is more flexible; especially if you intend to support > more policies: preferred, interleave. Which leads to the question: > > 2) Do you really want the full "membind" semantics as opposed to > "preferred" by default? Membind policy will restrict the VMs pages to > the specified nodeset and will initiate reclaim/stealing and wait for > pages to become available or the task is OOM-killed because of mempolicy > when all of the nodes in nodeset reach their minimum watermark. Membind > works the same as cpuset.mems in this respect. Preferred policy will > keep memory allocations [but not vcpu execution] local to the specified > set of nodes as long as there is sufficient memory, and will silently > "overflow" allocations to other nodes when necessary. I.e., it's a > little more forgiving under memory pressure. I think we need to make the choice of strict binding, vs preferred binding an XML tunable, since both options are valid. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list