On Fri, May 17, 2013 at 07:59:36PM +0800, Osier Yang wrote: > When either "cpuset" of <vcpu> is specified, or the "placement" of > <vcpu> is "auto", only setting the cpuset.mems might cause the guest > starting to fail. E.g. ("placement" of both <vcpu> and <numatune> is > "auto"): > > 1) Related XMLs > <vcpu placement='auto'>4</vcpu> > <numatune> > <memory mode='strict' placement='auto'/> > </numatune> > > 2) Host NUMA topology > % numactl --hardware > available: 8 nodes (0-7) > node 0 cpus: 0 4 8 12 16 20 24 28 > node 0 size: 16374 MB > node 0 free: 11899 MB > node 1 cpus: 32 36 40 44 48 52 56 60 > node 1 size: 16384 MB > node 1 free: 15318 MB > node 2 cpus: 2 6 10 14 18 22 26 30 > node 2 size: 16384 MB > node 2 free: 15766 MB > node 3 cpus: 34 38 42 46 50 54 58 62 > node 3 size: 16384 MB > node 3 free: 15347 MB > node 4 cpus: 3 7 11 15 19 23 27 31 > node 4 size: 16384 MB > node 4 free: 15041 MB > node 5 cpus: 35 39 43 47 51 55 59 63 > node 5 size: 16384 MB > node 5 free: 15202 MB > node 6 cpus: 1 5 9 13 17 21 25 29 > node 6 size: 16384 MB > node 6 free: 15197 MB > node 7 cpus: 33 37 41 45 49 53 57 61 > node 7 size: 16368 MB > node 7 free: 15669 MB > > 4) cpuset.cpus will be set as: (from debug log) > > 2013-05-09 16:50:17.296+0000: 417: debug : virCgroupSetValueStr:331 : > Set value '/sys/fs/cgroup/cpuset/libvirt/qemu/toy/cpuset.cpus' > to '0-63' > > 5) The advisory nodeset got from querying numad (from debug log) > > 2013-05-09 16:50:17.295+0000: 417: debug : qemuProcessStart:3614 : > Nodeset returned from numad: 1 > > 6) cpuset.mems will be set as: (from debug log) > > 2013-05-09 16:50:17.296+0000: 417: debug : virCgroupSetValueStr:331 : > Set value '/sys/fs/cgroup/cpuset/libvirt/qemu/toy/cpuset.mems' > to '0-7' > > I.E, the domain process's memory is restricted on the first NUMA node, > however, it can use all of the CPUs, which will very likely cause the > domain process to fail to start because of the kernel fails to allocate > memory with the possible mismatching between CPU nodes and memory nodes. This is only a problem if the kernel is forced to do allocation from a memory node which matches the CPU node. It is perfectly acceptable for the kernel to allocate memory from a node that is different from the CPU node in general. eg, it is the mode='strict' attribute in the XML above that causes the bug. > @@ -665,9 +666,35 @@ qemuSetupCpusetCgroup(virDomainObjPtr vm, > } > } > > + if (vm->def->cpumask || > + (vm->def->placement_mode == > + VIR_DOMAIN_CPU_PLACEMENT_MODE_AUTO)) { I think you should only be doing this if placement==auto *and* mode=strict. > + if (vm->def->placement_mode == > + VIR_DOMAIN_CPU_PLACEMENT_MODE_AUTO) > + cpu_mask = virBitmapFormat(nodemask); > + else > + cpu_mask = virBitmapFormat(vm->def->cpumask); > + > + if (!cpu_mask) { > + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", > + _("failed to convert memory nodemask")); > + goto cleanup; > + } > + > + rc = virCgroupSetCpusetCpus(priv->cgroup, cpu_mask); > + > + if (rc != 0) { > + virReportSystemError(-rc, > + _("Unable to set cpuset.cpus for domain %s"), > + vm->def->name); > + goto cleanup; > + } > + } > + > ret = 0; > cleanup: > - VIR_FREE(mask); > + VIR_FREE(mem_mask); > + VIR_FREE(cpu_mask); > return ret; Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list