On Wed, Jun 28, 2017 at 03:56:36PM +0200, Wim Ten Have wrote: > From: Wim ten Have <wim.ten.have@xxxxxxxxxx> > > Add libvirtd NUMA cell domain administration functionality to > describe underlying cell id sibling distances in full fashion > when configuring HVM guests. > > [below is an example of a 4 node setup] > > <cpu> > <numa> > <cell id='0' cpus='0' memory='2097152' unit='KiB'> > <distances> > <sibling id='0' value='10'/> > <sibling id='1' value='21'/> > <sibling id='2' value='31'/> > <sibling id='3' value='41'/> > </distances> > </cell> > <cell id='1' cpus='1' memory='2097152' unit='KiB'> > <distances> > <sibling id='0' value='21'/> > <sibling id='1' value='10'/> > <sibling id='2' value='31'/> > <sibling id='3' value='41'/> > </distances> > </cell> > <cell id='2' cpus='2' memory='2097152' unit='KiB'> > <distances> > <sibling id='0' value='31'/> > <sibling id='1' value='21'/> > <sibling id='2' value='10'/> > <sibling id='3' value='21'/> > </distances> > <cell id='3' cpus='3' memory='2097152' unit='KiB'> > <distances> > <sibling id='0' value='41'/> > <sibling id='1' value='31'/> > <sibling id='2' value='21'/> > <sibling id='3' value='10'/> > </distances> > </cell> > </numa> > </cpu> > > Changes under this commit concern all those that require adding > the valid data structures, virDomainNuma* functional routines and the > XML doc schema enhancements to enforce appropriate administration. > > These changes are accompanied with topic related documentation under > docs/formatdomain.html within the "CPU model and topology" extending the > "Guest NUMA topology" paragraph. > > For terminology we refer to sockets as "nodes" where access to each > others' distinct resources such as memory make them "siblings" with a > designated "distance" between them. A specific design is described under > the ACPI (Advanced Configuration and Power Interface Specification) > within the chapter explaining the system's SLIT (System Locality Distance > Information Table). > > These patches extend core libvirt's XML description of a virtual machine's > hardware to include NUMA distance information for sibling nodes, which > is then passed to Xen guests via libxl. Recently qemu landed support for > constructing the SLIT since commit 0f203430dd ("numa: Allow setting NUMA > distance for different NUMA nodes"), hence these core libvirt extensions > can also help other drivers in supporting this feature. > > These changes alter the docs/schemas/cputypes.rng enforcing > domain administration to follow the syntax below per numa cell id. > > These changes also alter docs/schemas/basictypes.rng to add > "numaDistanceValue" which is an "unsignedInt" with a minimum value > of 10 as 0-9 are reserved values and can not be used as System > Locality Distance Information Table data. > > Signed-off-by: Wim ten Have <wim.ten.have@xxxxxxxxxx> > --- > docs/formatdomain.html.in | 64 ++++++++++- > docs/schemas/basictypes.rng | 8 ++ > docs/schemas/cputypes.rng | 18 +++ > src/conf/cpu_conf.c | 2 +- > src/conf/numa_conf.c | 260 +++++++++++++++++++++++++++++++++++++++++++- > src/conf/numa_conf.h | 25 ++++- > src/libvirt_private.syms | 6 + > 7 files changed, 376 insertions(+), 7 deletions(-) > > diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in > index 36bea67..b9736e3 100644 > --- a/docs/formatdomain.html.in > +++ b/docs/formatdomain.html.in > @@ -1510,7 +1510,69 @@ > </p> > > <p> > - This guest NUMA specification is currently available only for QEMU/KVM. > + This guest NUMA specification is currently available only for > + QEMU/KVM and Xen. Whereas Xen driver also allows for a distinct > + description of NUMA arranged <code>sibling</code> <code>cell</code> > + <code>distances</code> <span class="since">Since 3.6.0</span>. > + </p> > + > + <p> > + Under NUMA h/w architecture, distinct resources such as memory > + create a designated distance between <code>cell</code> and > + <code>siblings</code> that now can be described with the help of > + <code>distances</code>. A detailed describtion can be found within > + the ACPI (Advanced Configuration and Power Interface Specification) > + within the chapter explaining the system's SLIT (System Locality > + Distance Information Table). > + </p> > + > +<pre> > +... > +<cpu> > + ... > + <numa> > + <cell id='0' cpus='0,4-7' memory='512000' unit='KiB'> > + <distances> > + <sibling id='0' value='10'/> > + <sibling id='1' value='21'/> > + <sibling id='2' value='31'/> > + <sibling id='3' value='41'/> > + </distances> > + </cell> > + <cell id='1' cpus='1,8-10,12-15' memory='512000' unit='KiB' memAccess='shared'> > + <distances> > + <sibling id='0' value='21'/> > + <sibling id='1' value='10'/> > + <sibling id='2' value='21'/> > + <sibling id='3' value='31'/> > + </distances> > + </cell> > + <cell id='2' cpus='2,11' memory='512000' unit='KiB' memAccess='shared'> > + <distances> > + <sibling id='0' value='31'/> > + <sibling id='1' value='21'/> > + <sibling id='2' value='10'/> > + <sibling id='3' value='21'/> > + </distances> > + </cell> > + <cell id='3' cpus='3' memory='512000' unit='KiB'> > + <distances> > + <sibling id='0' value='41'/> > + <sibling id='1' value='31'/> > + <sibling id='2' value='21'/> > + <sibling id='3' value='10'/> > + </distances> > + </cell> > + </numa> > + ... > +</cpu> > +...</pre> > + > + <p> > + Under Xen driver, if no <code>distances</code> are given to describe > + the SLIT data between different cells, it will default to a scheme > + using 10 for local and 21 for other cells. Which is the assumption > + of guest OS when no SLIT is specified. > </p> We need to specify what happens if partial data is given too. In particular if you give a distance for A -> B, and not B -> A, then we should assume symmetry. Rather than having the defaults be duplicated in drivers, we should have a helper API in domain_conf.c that can be used to query the distance between 2 nodes that can be called by drivers. This can then apply defaults in a consistent manner. > @@ -66,6 +68,12 @@ struct _virDomainNuma { > virBitmapPtr nodeset; /* host memory nodes where this guest node resides */ > virDomainNumatuneMemMode mode; /* memory mode selection */ > virDomainMemoryAccess memAccess; /* shared memory access configuration */ > + > + struct _virDomainNumaDistance { > + unsigned int value; /* locality value for node i*j */ Strictly speaking distances have an orderng, ie this is 'node i -> j' in that specific direction. 'node j -> i' can be different. > + unsigned int cellid; Needs extra indent - 4 spaces per indent level > + } *distances; /* remote node distances */ > + size_t ndistances; > } *mem_nodes; /* guest node configuration */ > size_t nmem_nodes; > > @@ -686,6 +694,95 @@ virDomainNumatuneNodesetIsAvailable(virDomainNumaPtr numatune, > } > + for (i = 0; i < ndistances; i++) { > + unsigned int sibling_id = i, sibling_value; > + > + /* siblings are in order of parsing or explicitly numbered */ > + if ((tmp = virXMLPropString(nodes[i], "id"))) { > + if (virStrToLong_uip(tmp, NULL, 10, &sibling_id) < 0) { > + virReportError(VIR_ERR_XML_ERROR, > + _("Invalid 'id' attribute in NUMA distances for sibling: '%s'"), > + tmp); > + goto cleanup; > + } > + > + if (sibling_id >= ndistances) { > + virReportError(VIR_ERR_XML_ERROR, "%s", > + _("Exactly one 'sibling' element per NUMA distance " > + "is allowed, non-contiguous ranges or ranges not " > + "starting from 0 are not allowed")); We should allow non-contiguous ranges, starting from any node id. ie users should be free to provide sparse information, relying on defaults for the rest. > + goto cleanup; > + } > + } > + VIR_FREE(tmp); > + > + /* We need a locality value */ > + if (!(tmp = virXMLPropString(nodes[i], "value"))) { > + virReportError(VIR_ERR_XML_ERROR, > + _("Missing 'value' attribute in NUMA distances for sibling id: '%d'"), > + sibling_id); > + goto cleanup; > + } > + > + /* It needs to be applicable */ > + if (virStrToLong_uip(tmp, NULL, 10, &sibling_value) < 0) { > + virReportError(VIR_ERR_XML_ERROR, > + _("Invalid 'value' attribute in NUMA distances for sibling id: '%d'"), > + sibling_id); > + goto cleanup; > + } > + VIR_FREE(tmp); > + > + distances[sibling_id].cellid = sibling_id; > + distances[sibling_id].value = sibling_value; > + } > + > + def->mem_nodes[cur_cell].distances = distances; > + def->mem_nodes[cur_cell].ndistances = ndistances; > + > + ret = 0; > + > + cleanup: > + if (ret) > + VIR_FREE(distances); > + VIR_FREE(nodes); > + VIR_FREE(tmp); > + > + return ret; > +} > + > int > virDomainNumaDefCPUParseXML(virDomainNumaPtr def, > xmlXPathContextPtr ctxt) > @@ -788,6 +885,14 @@ virDomainNumaDefCPUParseXML(virDomainNumaPtr def, > def->mem_nodes[cur_cell].memAccess = rc; > VIR_FREE(tmp); > } > + > + /* Parse NUMA distances info */ > + if (virDomainNumaDefNodeDistanceParseXML(def, ctxt, cur_cell) < 0) { > + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, > + _("NUMA cell %d has incorrect 'distances' configured"), > + cur_cell); > + goto cleanup; > + } This method already reported an detailed error message, which is being overwritten here with a useless generic message. > +size_t > +virDomainNumaSetNodeCount(virDomainNumaPtr numa, size_t nmem_nodes) > +{ > + if (!numa || !nmem_nodes) > + return 0; This error condition doesn't report any error message... > + > + if (numa->mem_nodes) { > + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", > + _("Cannot alter an existing nmem_nodes set")); > + return 0; > + } ...but this does... > + > + if (VIR_ALLOC_N(numa->mem_nodes, nmem_nodes) < 0) > + return 0; ...and so does this. Callers need to have consistent error reporting for all scenarios > + > + numa->nmem_nodes = nmem_nodes; > + > + return numa->nmem_nodes; > +} > + > + > +size_t > +virDomainNumaGetNodeDistance(virDomainNumaPtr numa, > + size_t node, > + size_t cellid) > +{ > + virDomainNumaDistancePtr distances; > + > + if (!numa) > + return 0; Just mark 'numa' parameters as ATTRIBUTE_NONNULL and make sure callers don't run this method if numa is not configured. > + > + distances = numa->mem_nodes[node].distances; You've not bounds-checked 'node' vs mem_nodes size. > + if (!numa->mem_nodes[node].ndistances || !distances) > + return 0; This is where we should plug in defaults for partially specified data. > + > + return distances[cellid].value; > +} What are calling semantics here ? I think we should > + > + > +size_t > +virDomainNumaSetNodeDistance(virDomainNumaPtr numa, > + size_t node, > + size_t cellid, > + unsigned int value) > +{ > + virDomainNumaDistancePtr distances; > + > + /* > + * Advanced Configuration and Power Interface > + * Specification version 6.1. Chapter 5.2.17 > + * System Locality Distance Information Table > + * ... Distance values of 0-9 are reserved. > + */ > + if (!numa || value < 10) > + return 0; ATTRIBUTE_NONNULL instead, and report an error for values < 10 > + > + distances = numa->mem_nodes[node].distances; > + > + if (numa->mem_nodes[node].ndistances > 0 && > + distances[cellid].value) > + return 0; I'm not sure what this check is doing - it seems to be refusing to let you change the value if one is already set. I don't see the point of that, but if it is needed then we should report errors > + > + distances[cellid].cellid = cellid; > + distances[cellid].value = value; > + > + return distances[cellid].value; > +} > + > + > +size_t > +virDomainNumaGetNodeDistanceCount(virDomainNumaPtr numa, > + size_t node) > +{ > + if (!numa) > + return 0; ATTRIBUTE_NONNULL instead > + > + return numa->mem_nodes[node].ndistances; > +} > + > + > +size_t > +virDomainNumaSetNodeDistanceCount(virDomainNumaPtr numa, > + size_t node, > + size_t ndistances) > +{ > + virDomainNumaDistancePtr distances; > + > + if (!numa || !ndistances) > + return 0; ATTRIBUTE_NONNULL for first. For the second we should presumably free any existing distance data > + > + distances = numa->mem_nodes[node].distances; > + if (distances) { > + virReportError(VIR_ERR_INTERNAL_ERROR, > + _("Cannot alter an existing nmem_nodes distances set for node: %zu"), > + node); > + return 0; > + } > + > + if (VIR_ALLOC_N(distances, ndistances) < 0) > + return 0; > + > + numa->mem_nodes[node].distances = distances; > + numa->mem_nodes[node].ndistances = ndistances; > + > + return numa->mem_nodes[node].ndistances; > +} > + > + > virBitmapPtr > virDomainNumaGetNodeCpumask(virDomainNumaPtr numa, > size_t node) > @@ -937,6 +1175,20 @@ virDomainNumaGetNodeCpumask(virDomainNumaPtr numa, > } > > > +virBitmapPtr > +virDomainNumaSetNodeCpumask(virDomainNumaPtr numa, > + size_t node, > + virBitmapPtr cpumask) > +{ > + if (!numa || !cpumask) > + return NULL; > + > + numa->mem_nodes[node].cpumask = cpumask; > + > + return numa->mem_nodes[node].cpumask; > +} > + Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list