On Fri, Mar 27, 2015 at 03:26:53PM +0100, Michal Privoznik wrote: > Once upon a time, there was a little domain. And the domain was pinned > onto a NUMA node and hasn't fully allocated its memory: > > <memory unit='KiB'>2355200</memory> > <currentMemory unit='KiB'>1048576</currentMemory> > > <numatune> > <memory mode='strict' nodeset='0'/> > </numatune> > > Oh little me, said the domain, what will I do with so few memory. s/few/little/ > If I only had a few megabytes more. But the old admin noticed > whimpering, barely audible to untrained human ear. And good admin he the whimpering the good admin (?) > was, he gave the domain yet more memory. But the old NUMA topology > witch forbidden to allocate more memory on the node zero. So he forbidden -> forbade or forbid > decided to allocate it on a different node: > > virsh # numatune little_domain --nodeset 0-1 > > virsh # setmem little_domain 2355200 > > The little domain was happy. For a while. Until bad, sharp teeth a bad? the bad? > shaped creature came. Every process in the system was afraid of him. > The OOM Killer they called him. Oh no, he's after the little domain. > There's no escape. > > Do you kids know why? Because when the little domain was born, her > father, Libvirt, called numa_set_membind(). So even if the admin > allowed her to allocate memory from other nodes in the cgroups, the > membind() forbid it. > > So what's the lesson? Libvirt should rely on cgroups, whenever > possible and use numa_set_membind() as the last ditch effort. > > Signed-off-by: Michal Privoznik <mprivozn@xxxxxxxxxx> > --- > src/qemu/qemu_process.c | 32 ++++++++++++++++++++++++++++---- > 1 file changed, 28 insertions(+), 4 deletions(-) > I don't have much experience proofreading children's books, but the logic looks okay to me. > diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c > index 79f763e..cba042d 100644 > --- a/src/qemu/qemu_process.c > +++ b/src/qemu/qemu_process.c > @@ -3154,6 +3154,7 @@ static int qemuProcessHook(void *data) > int fd; > virBitmapPtr nodeset = NULL; > virDomainNumatuneMemMode mode; > + bool doNuma = true; > > /* This method cannot use any mutexes, which are not > * protected across fork() > @@ -3185,11 +3186,34 @@ static int qemuProcessHook(void *data) > goto cleanup; > > mode = virDomainNumatuneGetMode(h->vm->def->numa, -1); > - nodeset = virDomainNumatuneGetNodeset(h->vm->def->numa, > - priv->autoNodeset, -1); > > - if (virNumaSetupMemoryPolicy(mode, nodeset) < 0) > - goto cleanup; > + if (mode == VIR_DOMAIN_NUMATUNE_MEM_STRICT) { > + virCgroupPtr cgroup = NULL; > + > + /* Create dummy cgroup ... */ > + if (virCgroupNewSelf(&cgroup) < 0) > + goto cleanup; The domain's cgroup is accessible under priv->cgroup here, you can use that one instead. > + > + /* ... just to detect if cpuset cgroup is present */ > + if (virCgroupHasController(cgroup, VIR_CGROUP_CONTROLLER_CPUSET)) { > + /* Because if it's not, we must use virNuma* APIs to bind > + * memory onto desired nodes. CGroup way is preferred, as > + * it allows runtime tuning, while virNuma - well, once > + * set and child (qemu) is exec()-ed, we can't do > + * anything about the settings. virNuma* does not take > + * any PID argument after all. */ Can this comment be shortened? Jan
Attachment:
signature.asc
Description: Digital signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list