On Tue, Oct 25, 2016 at 02:13:07PM +0200, Martin Kletzander wrote: > On Tue, Oct 25, 2016 at 01:10:23PM +1100, Sam Bobroff wrote: > >On Tue, Oct 18, 2016 at 10:43:31PM +0200, Martin Kletzander wrote: > >>On Mon, Oct 17, 2016 at 03:45:09PM +1100, Sam Bobroff wrote: > >>>On Fri, Oct 14, 2016 at 10:19:42AM +0200, Martin Kletzander wrote: > >>>>On Fri, Oct 14, 2016 at 11:52:22AM +1100, Sam Bobroff wrote: > >>>>>I did look at the libnuma and cgroups approaches, but I was concerned they > >>>>>wouldn't work in this case, because of the way QEMU allocates memory when > >>>>>mem-prealloc is used: the memory is allocated in the main process, before the > >>>>>CPU threads are created. (This is based only on a bit of hacking and debugging > >>>>>in QEMU, but it does seem explain the behaviour I've seen so far.) > >>>>> > >>>> > >>>>But we use numactl before QEMU is exec()'d. > >>> > >>>Sorry, I jumped ahead a bit. I'll try to explain what I mean: > >>> > >>>I think the problem with using this method would be that the NUMA policy is > >>>applied to all allocations by QEMU, not just ones related to the memory > >>>backing. I'm not sure if that would cause a serious problem but it seems untidy, > >>>and it doesn't happen in other situations (i.e. with separate memory backend > >>>objects, QEMU sets up the policy specifically for each one and other > >>>allocations aren't affected, AFAIK). Presumably, if memory were very > >>>restricted it could prevent the guest from starting. > >>> > >> > >>Yes, it is, that's what <numatune><memory/> does if you don't have any > >>other (<memnode/>) specifics set. > >> > >>>>>I think QEMU could be altered to move the preallocations into the VCPU > >>>>>threads but it didn't seem trivial and I suspected the QEMU community would > >>>>>point out that there was already a way to do it using backend objects. Another > >>>>>option would be to add a -host-nodes parameter to QEMU so that the policy can > >>>>>be given without adding a memory backend object. (That seems like a more > >>>>>reasonable change to QEMU.) > >>>>> > >>>> > >>>>I think upstream won't like that, mostly because there is already a > >>>>way. And that is using memory-backend object. I think we could just > >>>>use that and disable changing it live. But upstream will probably want > >>>>that to be configurable or something. > >>> > >>>Right, but isn't this already an issue in the cases where libvirt is already > >>>using memory backend objects and NUMA policy? (Or does libvirt already disable > >>>changing it live in those situations?) > >>> > >> > >>It is. I'm not trying to say libvirt is perfect. There are bugs, > >>e.g. like this one. The problem is that we tried to do *everything*, > >>but it's not currently possible. I'm trying to explain how stuff works > >>now. It definitely needs some fixing, though. > > > >OK :-) > > > >Well, given our discussion, do you think it's worth a v2 of my original patch > >or would it be better to drop it in favour of some broader change? > > > > Honestly, I thought about the approaches so much I'm now not sure I'll > make a good decision. RFC could do. If I were to pick, I would go with > a new setting that would control whether we want the binding to be > changeable throughout the domain's lifetime or not so that we can make > better decisions (and don't feel bad about the bad ones). I feel the same way. OK, I'll try an RFC patch with a lot of description. I'm specifically trying to address the issue I originally raised, which isn't quite the same thing as the changeability of the bindings but I'll keep that in mind. I think your point about changing the bindings will apply in the same way whenever QEMU's memory-backend objects are used with their "host-nodes" attribute (since they are what causes QEMU to apply policy), so I don't think I'm suggesting any significant change there. If you want to add the new setting you mention above, I'd be happy to base my patch of top of that work. ;-) Cheers, Sam. -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list