On Tue, Jul 29, 2014 at 05:05:23PM +0100, Daniel P. Berrange wrote: > On Tue, Jul 29, 2014 at 04:40:50PM +0200, Peter Krempa wrote: > > On 07/24/14 17:03, Peter Krempa wrote: > > > On 07/24/14 16:40, Daniel P. Berrange wrote: > > >> On Thu, Jul 24, 2014 at 04:30:43PM +0200, Peter Krempa wrote: > > >>> On 07/24/14 16:21, Daniel P. Berrange wrote: > > >>>> On Thu, Jul 24, 2014 at 02:20:22PM +0200, Peter Krempa wrote: > > > > >> > > >>>> So from that POV, I'd say that when we initially configure the > > >>>> NUMA / huge page information for a guest at boot time, we should > > >>>> be doing that wrt to the 'maxMemory' size, instead of the current > > >>>> 'memory' size. ie the actual NUMA topology is all setup upfront > > >>>> even though the DIMMS are not present for some of this topology. > > >>>> > > >>>>> "address" determines the address in the guest's memory space where the > > >>>>> memory will be mapped. This is optional and not recommended being set by > > >>>>> the user (except for special cases). > > >>>>> > > >>>>> For expansion the model="pflash" device may be added. > > >>>>> > > >>>>> For migration the target VM needs to be started with the hotplugged > > >>>>> modules already specified on the command line, which is in line how we > > >>>>> treat devices currently. > > >>>>> > > >>>>> My suggestion above contrasts with the approach Michal and Martin took > > >>>>> when adding the numa and hugepage backing capabilities as they describe > > >>>>> a node while this describes the memory device beneath it. I think those > > >>>>> two approaches can co-exist whilst being mutually-exclusive. Simply when > > >>>>> using memory hotplug, the memory will need to be specified using the > > >>>>> memory modules. Non-hotplug guests could use the approach defined > > >>>>> originally. > > >>>> > > >>>> I don't think it is viable to have two different approaches for configuring > > >>>> NUMA / huge page information. Apps should not have to change the way they > > >>>> configure NUMA/hugepages when they decide they want to take advantage of > > >>>> DIMM hotplug. > > >>> > > >>> Well, the two approaches are orthogonal in the information they store. > > >>> The existing approach stores the memory topology from the point of view > > >>> of the numa node whereas the <device> based approach from the point of > > >>> the memory module. > > >> > > >> Sure, they are clearly designed from different POV, but I'm saying that > > >> from an application POV is it very unpleasant to have 2 different ways > > >> to configure the same concept in the XML. So I really don't want us to > > >> go down that route unless there is absolutely no other option to achieve > > >> an acceptable level of functionality. If that really were the case, then > > >> I would strongly consider reverting everything related to NUMA that we > > >> have just done during this dev cycle and not releasing it as is. > > >> > > >>> The difference is that the existing approach currently wouldn't allow > > >>> splitting a numa node into more memory devices to allow > > >>> plugging/unplugging them. > > >> > > >> There's no reason why we have to assume 1 memory slot per guest or > > >> per node when booting the guest. If the user wants the ability to > > >> unplug, they could set their XML config so the guest has arbitrary > > >> slot granularity. eg if i have a guest > > >> > > >> - memory == 8 GB > > >> - max-memory == 16 GB > > >> - NUMA nodes == 4 > > >> > > >> Then we could allow them to specify 32 memory slots each 512 MB > > >> in size. This would allow them to plug/unplug memory from NUMA > > >> nodes in 512 MB granularity. > > > > In real hardware you still can plug in modules of different sizes. (eg > > 1GiB + 2Gib) ... > > I was just illustrating that as an example of the default we'd > write into the XML if the app hadn't explicitly given any slot > info themselves. If doing it manually you can of course list > the slots with arbitrary sizes, each a different size. > > > > Well, while this makes it pretty close to real hardware, the emulated > > > one doesn't have a problem with plugging "dimms" of weird > > > (non-power-of-2) sizing. And we are loosing flexibility due to that. > > > > > > > Hmm, now that the rest of the Hugepage stuff was pushed and the release > > is rather soon. What approach should I take? I'd rather avoid crippling > > the interface for memory hotplug and having to add separate apis and > > other stuff and mostly I'd like to avoid having to re-do it after > > consumers of libvirt deem it to be unflexible. > > NB, as a general point of design, it isn't our goal to always directly > expose every possible way to configuring things that QEMU allows. If > there are multiple ways to achieve the same end goal it is valid for > libvirt to pick a particular approach and not expose all possible QEMU > flexibility. This is especially true if this makes cross-hypervisor > support of the feature more practical. > > Looking at the big picture, we've got a bunch of memory related > configuration sets > > - Guest NUMA topology setup, assigning vCPUs and RAM to guest nodes > > <cpu> > <numa> > <cell id='0' cpus='0' memory='512000'/> > <cell id='1' cpus='1' memory='512000'/> > <cell id='2' cpus='2-3' memory='1024000'/> > </numa> > </cpu> > > - Request the use of huge pages, optionally different size > per guest NUMA node > > <memoryBacking> > <hugepages/> > </memoryBacking> > > <memoryBacking> > <hugepages> > <page size='2048' unit='KiB' nodeset='0,1'/> > <page size='1' unit='GiB' nodeset='2'/> > </hugepages> > </memoryBacking> > > - Mapping of guest NUMA nodes to host NUMA nodes > > <numatune> > <memory mode="strict" nodeset="1-4,^3"/> > <memnode cellid="0" mode="strict" nodeset="1"/> > <memnode cellid="1" mode="strict" nodeset="2"/> > </numatune> > > > At the QEMU level, aside from the size of the DIMM, the memory slot > device lets you > > 1. Specify guest NUMA node to attach to > 2. Specify host NUMA node to assign to > 3. Request use of huge pages, optionally with size [snip] > So I think it is valid for libvirt to expose the memory slot feature > just specifying the RAM size and the guest NUMA node and infer huge > page usage, huge page size and host NUMA node from existing data that > libvirt has in its domain XML document elsewhere. I meant to outline how I thought hotplug/unplug would interact with the existing data. When first booting the guest - If the XML does not include any memory slot info, we should add minimum possible memory slots to match the per-guest NUMA node config. - If XML does include slots, then we must validate that the sum of the memory for slots listed against each guest NUMA node matches the memory set in /cpu/numa/cell/@memory When hugepages are in use we need to make we validate that we're adding slots whose size is a multiple of huge page size. The code should already be validating that each NUMA node is a multiple of the configured hge page size for that node. When hotplugging / unplugging - Libvirt would update the /cpu/numa/cell/@memory attribute and /memory element to reflect the newly added/removed DIMM Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list