On Thu, Jul 20, 2017 at 03:03:25PM +0200, Wim ten Have wrote: > On Thu, 20 Jul 2017 11:10:31 +0100 > "Daniel P. Berrange" <berrange@xxxxxxxxxx> wrote: > > > On Thu, Jul 20, 2017 at 11:29:26AM +0200, Wim Ten Have wrote: > > > From: Wim ten Have <wim.ten.have@xxxxxxxxxx> > > > > > > The QEMU driver can erroneously allocate more vpus to a domain > > > than there are cpus in the domain if the <numa> element is used > > > to describe <cpu> element topology. Fix this by calculating > > > the number of cpus described in the <numa> element of a <cpu> > > > element and comparing it to the number of vcpus. If the number > > > of vcpus is greater than the number of cpus, cap the number of > > > vcpus to the number of cpus. > > > > > > This patch also adds a small libvirt documentation update under > > > the "CPU Allocation" section. > > > > > > Signed-off-by: Wim ten Have <wim.ten.have@xxxxxxxxxx> > > > --- > > > docs/formatdomain.html.in | 9 ++++++++- > > > src/conf/domain_conf.c | 14 +++++++++++--- > > > 2 files changed, 19 insertions(+), 4 deletions(-) > > > > > > diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in > > > index 07208ee..3c85307 100644 > > > --- a/docs/formatdomain.html.in > > > +++ b/docs/formatdomain.html.in > > > @@ -501,7 +501,14 @@ > > > <dt><code>vcpu</code></dt> > > > <dd>The content of this element defines the maximum number of virtual > > > CPUs allocated for the guest OS, which must be between 1 and > > > - the maximum supported by the hypervisor. > > > + the maximum supported by the hypervisor. If a <code>numa</code> > > > + element is contained within the <code>cpu</code> element, the > > > + number of virtual CPUs to be allocated is compared with the sum > > > + of the <code>cpus</code> attributes across the <code>cell</code> > > > + elements within the <code>numa</code> element. If the number of > > > + virtual CPUs is greater than the sum of the <code>cpus</code> > > > + attributes, the number of virtual CPUs is capped at sum of the > > > + <code>cpus</code> attributes. > > > <dl> > > > <dt><code>cpuset</code></dt> > > > <dd> > > > diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c > > > index 958a5b7..73d7f4f 100644 > > > --- a/src/conf/domain_conf.c > > > +++ b/src/conf/domain_conf.c > > > @@ -3844,12 +3844,20 @@ virDomainDefPostParseMemory(virDomainDefPtr def, > > > unsigned long long numaMemory = 0; > > > unsigned long long hotplugMemory = 0; > > > > > > - /* Attempt to infer the initial memory size from the sum NUMA memory sizes > > > - * in case ABI updates are allowed or the <memory> element wasn't specified */ > > > + /* Attempt to infer the initial memory size from the sum NUMA memory > > > + * sizes in case ABI updates are allowed or the <memory> element > > > + * wasn't specified. Also cap the vcpu count to the sum of NUMA cpus > > > + * allocated for all nodes. */ > > > if (def->mem.total_memory == 0 || > > > parseFlags & VIR_DOMAIN_DEF_PARSE_ABI_UPDATE || > > > - parseFlags & VIR_DOMAIN_DEF_PARSE_ABI_UPDATE_MIGRATION) > > > + parseFlags & VIR_DOMAIN_DEF_PARSE_ABI_UPDATE_MIGRATION) { > > > + unsigned int numaVcpus = 0; > > > + > > > + if ((numaVcpus = virDomainNumaGetCPUCountTotal(def->numa))) > > > + virDomainDefSetVcpus(def, numaVcpus); > > > + > > > numaMemory = virDomainNumaGetMemorySize(def->numa); > > > + } > > > > AFAICT, this scenario is simply a user configuration mistake, and so we > > should report an error message to them to fix it. > > Not to push but I think this is the correct way ... O:-). > > Ok, perhaps the documentation note/commit message should be slightly > rewritten aswell as the altered comment on specific code section. > > The change is _NOT_ changing final guest provided 'maxvcpus' but > 'vcpus' instead. This means that it adds the "current='#count'" under > the vcpu element if such is less then vcpu 'maxvcpus' provided count. > If equal there is no issue and if larger there is indeed a correct > error message (exceptional condition). > > Imagine simpel domain description _WITHOUT_ <numa> and below <vcpu> > element description; > > <vcpu placement='static' current='4'>16</vcpu> > > This nicely creates a guest with 4 domain CPUs added, where the > administrator can "hot-add" an additional 12 CPUs making the full > 'maxvcpus' defined 16. "hot-add" by virsh; > > virsh # setvcpus kvm26 16 > > Without the fix under former <numa> domain description libvirt would > bring whole '16' vcpus to the guest, and if there was a current value > given all by current on top of the <numa> <cell ... to NUMA-node0 > for that matter :-( so wrong; > > <vcpu placement='static'>16</vcpu> > .. > <numa> > <cell id='0' cpus='0' memory='262144' unit='KiB'/> > <cell id='1' cpus='1' memory='262144' unit='KiB'/> > <cell id='2' cpus='2' memory='262144' unit='KiB'/> > <cell id='3' cpus='3' memory='262144' unit='KiB'/> > </numa> This is saying that we have 16 CPU sockets, but you're only assigning NUMA affinity to 4 of the sockets. IMHO that is a configuration mistake. All 16 CPU sockets need to have affinity assigned, so that when they're later brought online they can be placed suitably on the host. > > With the fix, its post configuration action will now nicely rewrite > the <vcpu ... current='#count'> element as shown below; > > <vcpu placement='static' current='4'>16</vcpu> > .. > <numa> > <cell id='0' cpus='0' memory='262144' unit='KiB'/> > <cell id='1' cpus='1' memory='262144' unit='KiB'/> > <cell id='2' cpus='2' memory='262144' unit='KiB'/> > <cell id='3' cpus='3' memory='262144' unit='KiB'/> > </numa> IMHO that is still just as broken as the previous config, because it isn't saying how to set affinity on the remaining 12 vCPUs that can be brought online. > > In case "current='#count'" description is given where it does not > match the <numa> <cell ... #cpus> count then such is corrected to > cap the sum of all <numa> <cell ... #cpus>. > Perhaps that occurance should be rejected with an error message, then > such is not exceeding the domain administrators written <vcpu> 'maxvcpus' > element being '16' within my example above. That is a hard limit and > of course rejected with an error. > > I will not argue further ... O:-) I'll wait and if it is a _NO_ then > it is a _NO_! Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list