Re: [PATCH 1/1] qemu: host NUMA hugepage policy without guest NUMA

Sam Bobroff <sam.bobroff@xxxxxxxxxxx> · Thu, 27 Oct 2016 15:15:25 +1100

On Tue, Oct 25, 2016 at 02:13:07PM +0200, Martin Kletzander wrote:
> On Tue, Oct 25, 2016 at 01:10:23PM +1100, Sam Bobroff wrote:
> >On Tue, Oct 18, 2016 at 10:43:31PM +0200, Martin Kletzander wrote:
> >>On Mon, Oct 17, 2016 at 03:45:09PM +1100, Sam Bobroff wrote:
> >>>On Fri, Oct 14, 2016 at 10:19:42AM +0200, Martin Kletzander wrote:
> >>>>On Fri, Oct 14, 2016 at 11:52:22AM +1100, Sam Bobroff wrote:
> >>>>>I did look at the libnuma and cgroups approaches, but I was concerned they
> >>>>>wouldn't work in this case, because of the way QEMU allocates memory when
> >>>>>mem-prealloc is used: the memory is allocated in the main process, before the
> >>>>>CPU threads are created. (This is based only on a bit of hacking and debugging
> >>>>>in QEMU, but it does seem explain the behaviour I've seen so far.)
> >>>>>
> >>>>
> >>>>But we use numactl before QEMU is exec()'d.
> >>>
> >>>Sorry, I jumped ahead a bit. I'll try to explain what I mean:
> >>>
> >>>I think the problem with using this method would be that the NUMA policy is
> >>>applied to all allocations by QEMU, not just ones related to the memory
> >>>backing. I'm not sure if that would cause a serious problem but it seems untidy,
> >>>and it doesn't happen in other situations (i.e. with separate memory backend
> >>>objects, QEMU sets up the policy specifically for each one and other
> >>>allocations aren't affected, AFAIK).  Presumably, if memory were very
> >>>restricted it could prevent the guest from starting.
> >>>
> >>
> >>Yes, it is, that's what <numatune><memory/> does if you don't have any
> >>other (<memnode/>) specifics set.
> >>
> >>>>>I think QEMU could be altered to move the preallocations into the VCPU
> >>>>>threads but it didn't seem trivial and I suspected the QEMU community would
> >>>>>point out that there was already a way to do it using backend objects.  Another
> >>>>>option would be to add a -host-nodes parameter to QEMU so that the policy can
> >>>>>be given without adding a memory backend object. (That seems like a more
> >>>>>reasonable change to QEMU.)
> >>>>>
> >>>>
> >>>>I think upstream won't like that, mostly because there is already a
> >>>>way.  And that is using memory-backend object.  I think we could just
> >>>>use that and disable changing it live.  But upstream will probably want
> >>>>that to be configurable or something.
> >>>
> >>>Right, but isn't this already an issue in the cases where libvirt is already
> >>>using memory backend objects and NUMA policy? (Or does libvirt already disable
> >>>changing it live in those situations?)
> >>>
> >>
> >>It is.  I'm not trying to say libvirt is perfect.  There are bugs,
> >>e.g. like this one.  The problem is that we tried to do *everything*,
> >>but it's not currently possible.  I'm trying to explain how stuff works
> >>now.  It definitely needs some fixing, though.
> >
> >OK :-)
> >
> >Well, given our discussion, do you think it's worth a v2 of my original patch
> >or would it be better to drop it in favour of some broader change?
> >
> 
> Honestly, I thought about the approaches so much I'm now not sure I'll
> make a good decision.  RFC could do.  If I were to pick, I would go with
> a new setting that would control whether we want the binding to be
> changeable throughout the domain's lifetime or not so that we can make
> better decisions (and don't feel bad about the bad ones).

I feel the same way.

OK, I'll try an RFC patch with a lot of description.

I'm specifically trying to address the issue I originally raised, which isn't
quite the same thing as the changeability of the bindings but I'll keep that in
mind. I think your point about changing the bindings will apply in the
same way whenever QEMU's memory-backend objects are used with their
"host-nodes" attribute (since they are what causes QEMU to apply policy), so I
don't think I'm suggesting any significant change there.

If you want to add the new setting you mention above, I'd be happy to base my
patch of top of that work. ;-)

Cheers,
Sam.

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list