When the numatune memory mode is not "strict", the cpuset.mems inherits the parent's setting, which causes problem like: % virsh dumpxml rhel6_local | grep interleave -2 <vcpu placement='static'>2</vcpu> <numatune> <memory mode='interleave' nodeset='1-2'/> </numatune> <os> % cat /proc/3713/status | grep Mems_allowed_list Mems_allowed_list: 0-3 % virsh numatune rhel6_local numa_mode : interleave numa_nodeset : 0-3 Though the domain process's memory binding is set with libnuma after the cgroup setting. The reason for only allowing "strict" mode in current code is the cpuset.mems doesn't understand the memory policy modes (interleave, prefered, strict), it actually equals to the "strict" mode ("strict" means the allocation will fail if the memory cannot be allocated on the target node. Default operation is to fall back to other nodes. >From man numa(3)). However, writing the the cpuset.mems even if the numatune memory mode is not strict should be better than the blind inheritance anyway. --- However, I'm not comfortable with the solution, since anyway the modes except "strict" are not meaningful for cpuset.mems. Another problem what I'm not sure about is: If the cpuset.cpus will affect the libnuma setting? Assuming without this patch, domain process's cpuset.mems will be set as '0-7' (8 NUMA nodes, each has 8 CPUs). And the numatune memory mode is "interleave", and libnuma set the memory binding as "1-2". Even with this patch applied, setting cpuset.mems as "1-2", any potential problem? So this patch is mainly for raising up the problem, and to see if guys have any opinions. @hutao, since these codes are from you, any opinions/idea? Thanks. --- src/qemu/qemu_cgroup.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index 33eebd7..22fe25b 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -597,11 +597,9 @@ qemuSetupCpusetCgroup(virDomainObjPtr vm, if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_CPUSET)) return 0; - if ((vm->def->numatune.memory.nodemask || - (vm->def->numatune.memory.placement_mode == - VIR_NUMA_TUNE_MEM_PLACEMENT_MODE_AUTO)) && - vm->def->numatune.memory.mode == VIR_DOMAIN_NUMATUNE_MEM_STRICT) { - + if (vm->def->numatune.memory.nodemask || + (vm->def->numatune.memory.placement_mode == + VIR_NUMA_TUNE_MEM_PLACEMENT_MODE_AUTO)) { if (vm->def->numatune.memory.placement_mode == VIR_NUMA_TUNE_MEM_PLACEMENT_MODE_AUTO) mem_mask = virBitmapFormat(nodemask); @@ -614,6 +612,16 @@ qemuSetupCpusetCgroup(virDomainObjPtr vm, goto cleanup; } + if (vm->def->numatune.memory.mode == + VIR_DOMAIN_NUMATUNE_MEM_PREFERRED && + strlen(mem_mask) != 1) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("NUMA memory tuning in 'preferred' mode " + "only supports single node")); + goto cleanup; + + } + rc = virCgroupSetCpusetMems(priv->cgroup, mem_mask); if (rc != 0) { -- 1.8.1.4 -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list