Re: [RFC PATCH 8/8] qemu: Set cpuset.mems even if the numatune mode is not strict

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 13/05/13 14:46, Hu Tao wrote:
On Thu, May 09, 2013 at 06:22:17PM +0800, Osier Yang wrote:
When the numatune memory mode is not "strict", the cpuset.mems
inherits the parent's setting, which causes problem like:

% virsh dumpxml rhel6_local | grep interleave -2
   <vcpu placement='static'>2</vcpu>
   <numatune>
     <memory mode='interleave' nodeset='1-2'/>
   </numatune>
   <os>

% cat /proc/3713/status | grep Mems_allowed_list
   Mems_allowed_list:	0-3

% virsh numatune rhel6_local
   numa_mode      : interleave
   numa_nodeset   : 0-3
Yes the information is misleading.

Though the domain process's memory binding is set with libnuma
after the cgroup setting.

The reason for only allowing "strict" mode in current code is the
cpuset.mems doesn't understand the memory policy modes (interleave,
prefered, strict), it actually equals to the "strict" mode ("strict"
means the allocation will fail if the memory cannot be allocated on
the target node. Default operation is to fall back to other nodes.
Default is localalloc.
>From man numa(3)). However, writing the the cpuset.mems even if the
numatune memory mode is not strict should be better than the blind
inheritance anyway.
It's OK to interleave mode, combined with cpuset.memory_spread_xxx.

- cpuset.memory_spread_page flag: if set, spread page cache evenly on allowed nodes - cpuset.memory_spread_slab flag: if set, spread slab cache evenly on allowed nodes

Looks reasonable.

But what about preferred mode? comparing:

strict:  Strict means the allocation will fail if the memory cannot be
          allocated on the target node.

preferred: The system will attempt to allocate memory  from  the
            preferred node, but will fall back to other nodes if no
	   memory is available on the the preferred node.

For "preferred" mode, I have no idea, there is no related cgroup file(s) like
memory_spread_*. If we set cpuset.mems with the nodeset, it means
the memory allocation will behave like 'strict', which is not expected.

---
However, I'm not comfortable with the solution, since anyway the
modes except "strict" are not meaningful for cpuset.mems.

Another problem what I'm not sure about is: If the cpuset.cpus will
affect the libnuma setting? Assuming without this patch, domain
process's cpuset.mems will be set as '0-7' (8 NUMA nodes, each has 8
CPUs). And the numatune memory mode is "interleave", and libnuma set
the memory binding as "1-2". Even with this patch applied, setting
cpuset.mems as "1-2", any potential problem?

So this patch is mainly for raising up the problem, and to see if
guys have any opinions. @hutao, since these codes are from you, any
opinions/idea? Thanks.
---
  src/qemu/qemu_cgroup.c | 18 +++++++++++++-----
  1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c
index 33eebd7..22fe25b 100644
--- a/src/qemu/qemu_cgroup.c
+++ b/src/qemu/qemu_cgroup.c
@@ -597,11 +597,9 @@ qemuSetupCpusetCgroup(virDomainObjPtr vm,
      if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_CPUSET))
          return 0;
- if ((vm->def->numatune.memory.nodemask ||
-         (vm->def->numatune.memory.placement_mode ==
-          VIR_NUMA_TUNE_MEM_PLACEMENT_MODE_AUTO)) &&
-        vm->def->numatune.memory.mode == VIR_DOMAIN_NUMATUNE_MEM_STRICT) {
-
+    if (vm->def->numatune.memory.nodemask ||
+        (vm->def->numatune.memory.placement_mode ==
+         VIR_NUMA_TUNE_MEM_PLACEMENT_MODE_AUTO)) {
          if (vm->def->numatune.memory.placement_mode ==
              VIR_NUMA_TUNE_MEM_PLACEMENT_MODE_AUTO)
              mem_mask = virBitmapFormat(nodemask);
@@ -614,6 +612,16 @@ qemuSetupCpusetCgroup(virDomainObjPtr vm,
              goto cleanup;
          }
+ if (vm->def->numatune.memory.mode ==
+            VIR_DOMAIN_NUMATUNE_MEM_PREFERRED &&
+            strlen(mem_mask) != 1) {
+            virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+                           _("NUMA memory tuning in 'preferred' mode "
+                             "only supports single node"));
+            goto cleanup;
+
+        }
+
          rc = virCgroupSetCpusetMems(priv->cgroup, mem_mask);
if (rc != 0) {
--
1.8.1.4

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list




[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]