On Wed, May 28, 2014 at 11:18:30AM +0100, Daniel P. Berrange wrote:
On Wed, May 28, 2014 at 11:48:31AM +0200, Martin Kletzander wrote:Caveats: - I'm not sure how cpu hotplug is done with guest numa nodes, but if there is a possibility to increase the number of numa nodes (which does not make sense to me from (a) user point of view and (b) our XMLs and APIs), we need to be able to hotplug the ram as well,AFAIK, you cannot change the NUMA topology once booted. You'd have to and any new CPUs to existing NUMA nodes (assuming they have space). Likewise I'd expect that any RAM would have to be added to existing defined nodes. Of course ultimately nothing should stop the user defining some "empty" NUMA nodes if they want space to add new CPUs beyond the initial setup.- virDomainGetNumaParameters() now reflects only the /domain/numatune/memory settings, not 'memnode' ones,Yep, that's fine, though we'll likely want to have new APIs to deal with the guest NUMA node settings.
I was thinking how to extend the current one, but all the ideas seem too "messy".
- virDomainSetNumaParameters() is not allowed when there is some /domain/numatune/memnode parameter as we can query memdev info, but not change it (if I understood the QEMU side correctly), - when domain is started, cpuset.mems cgroup is not modified per for each vcpu, this will be fixed, but the question is how to handle it for non-strict settings [*], - automatic numad placement can be now used together with memnode settings which IMHO doesn't make any sense, but I was hesitant to disable that in case somebody has a constructive criticism in this area.IMHO if you're doing fine grained configuration of guest <-> host NUMA nodes, then you're not going to want numad. numad is really serving the use cases where you're lazy and want to ignore NUMA settings in the guest config. IOW, I think it is fine to forbid numad.
Good we're on the same page here, I just wanted to make sure before jumping to conclusions. Most of the use cases where one depends on numad are not very thought-through I guess.
- This series alone is broken when used with /domain/memoryBacking/hugepages, because it will still use the memory-ram object, but that will be fixed with Michal's patches on top of this series. One idea how to solve some of the problems is to say that /domain/numatune/memory is set for the whole domain regardless of what anyone puts in /domain/numatune/memnode. virDomainGetNumaParameters() could be extended to tell the info for all guest numa nodes, although it seems new API would suit better for this kind of information. But is it really neede when we are not able to modify it live and the information is available in the domain XML? *) does (or should) this: ... <numatune> <memory mode='strict' placement='static' nodeset='0-7'/> <memnode nodeid='0' mode='preferred' nodeset='7'/> </numatune> ... mean what it looks like it means, that is "in guest node 0, prefer allocating from host node 7 but feel free to allocate from 0-6 as well in case you can't use 7, but never try allocating from host nodes 8-15"?What we have to remember is that there's two different sets of threads and memory we're dealing with. There are the vCPU threads and the guest RAM allocation as one set, and there are misc emulator threads and other QEMU memory allocations. The <memnode> elements only apply to guest vCPUs and guest RAM. The <memory> element will still apply policy to the other QEMU emulator threads / RAM allocations.
Yes, my sense is that emulator should be set according to memory (as it's done now IIUC). It would mean we need to restrict the cpuset.cpus for that as well (in case there is no <emulatorpin>). We also need to properly error out on configuration that will fail (strict memory mode with nodes and cpus from different host numa nodes). This should be done for both emulator threads and cpu threads (it is not done now, btw).
WRT your question about virDomainSetNumaParameters above - I think the answer to whether that makes sense to allow, is dependant on whether it would be able to affect the QEMU emulator threads/RAM, without affecting the vCPU threads/guest RAM. We'd likely want a new virDomainSetNumaNodeParameters API to control settings for the <memnode> elements directly.
Yes. It also depends whether we'll be able to modify the guest node settings in QEMU (if not then we have to somehow unify when we use host-nodes and when not). Another question that came to my mind right now is whether we want to expose cpuset.memory_migrate as a parameter, too (or set it to some default value) since it will affect the performace a lot when the cpuset.mems are changes, won't it? Thank you for the responses, Martin
Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Attachment:
signature.asc
Description: Digital signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list