On 30.03.2015 15:15, Ján Tomko wrote: > On Fri, Mar 27, 2015 at 03:26:53PM +0100, Michal Privoznik wrote: >> Once upon a time, there was a little domain. And the domain was pinned >> onto a NUMA node and hasn't fully allocated its memory: >> >> <memory unit='KiB'>2355200</memory> >> <currentMemory unit='KiB'>1048576</currentMemory> >> >> <numatune> >> <memory mode='strict' nodeset='0'/> >> </numatune> >> >> Oh little me, said the domain, what will I do with so few memory. > > s/few/little/ > >> If I only had a few megabytes more. But the old admin noticed >> whimpering, barely audible to untrained human ear. And good admin he > > the whimpering > the good admin (?) > >> was, he gave the domain yet more memory. But the old NUMA topology >> witch forbidden to allocate more memory on the node zero. So he > > forbidden -> forbade or forbid > >> decided to allocate it on a different node: >> >> virsh # numatune little_domain --nodeset 0-1 >> >> virsh # setmem little_domain 2355200 >> >> The little domain was happy. For a while. Until bad, sharp teeth > > a bad? the bad? > >> shaped creature came. Every process in the system was afraid of him. >> The OOM Killer they called him. Oh no, he's after the little domain. >> There's no escape. >> >> Do you kids know why? Because when the little domain was born, her >> father, Libvirt, called numa_set_membind(). So even if the admin >> allowed her to allocate memory from other nodes in the cgroups, the >> membind() forbid it. >> >> So what's the lesson? Libvirt should rely on cgroups, whenever >> possible and use numa_set_membind() as the last ditch effort. >> >> Signed-off-by: Michal Privoznik <mprivozn@xxxxxxxxxx> >> --- >> src/qemu/qemu_process.c | 32 ++++++++++++++++++++++++++++---- >> 1 file changed, 28 insertions(+), 4 deletions(-) >> > > I don't have much experience proofreading children's books, > but the logic looks okay to me. > >> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c >> index 79f763e..cba042d 100644 >> --- a/src/qemu/qemu_process.c >> +++ b/src/qemu/qemu_process.c >> @@ -3154,6 +3154,7 @@ static int qemuProcessHook(void *data) >> int fd; >> virBitmapPtr nodeset = NULL; >> virDomainNumatuneMemMode mode; >> + bool doNuma = true; >> >> /* This method cannot use any mutexes, which are not >> * protected across fork() >> @@ -3185,11 +3186,34 @@ static int qemuProcessHook(void *data) >> goto cleanup; >> >> mode = virDomainNumatuneGetMode(h->vm->def->numa, -1); >> - nodeset = virDomainNumatuneGetNodeset(h->vm->def->numa, >> - priv->autoNodeset, -1); >> >> - if (virNumaSetupMemoryPolicy(mode, nodeset) < 0) >> - goto cleanup; >> + if (mode == VIR_DOMAIN_NUMATUNE_MEM_STRICT) { >> + virCgroupPtr cgroup = NULL; >> + >> + /* Create dummy cgroup ... */ >> + if (virCgroupNewSelf(&cgroup) < 0) >> + goto cleanup; > > The domain's cgroup is accessible under priv->cgroup here, you can use > that one instead. Good point, I though that CGroups are created later in the process. But due to handshaking with child, we are guaranteed that we can access priv->cgroup. Will send v2 in a while. Michal -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list