From: Wim ten Have <wim.ten.have@xxxxxxxxxx> Add libvirtd NUMA cell domain administration functionality to describe underlying cell id sibling distances in full fashion when configuring HVM guests. Schema updates are made to docs/schemas/cputypes.rng enforcing domain administration to follow the syntax below the numa cell id and docs/schemas/basictypes.rng to add "numaDistanceValue". A minimum value of 10 representing the LOCAL_DISTANCE as 0-9 are reserved values and can not be used as System Locality Distance Information. A value of 20 represents the default setting of REMOTE_DISTANCE where a maximum value of 255 represents UNREACHABLE. Effectively any cell sibling can be assigned a distance value where practically 'LOCAL_DISTANCE <= value <= UNREACHABLE'. [below is an example of a 4 node setup] <cpu> <numa> <cell id='0' cpus='0' memory='2097152' unit='KiB'> <distances> <sibling id='0' value='10'/> <sibling id='1' value='21'/> <sibling id='2' value='31'/> <sibling id='3' value='41'/> </distances> </cell> <cell id='1' cpus='1' memory='2097152' unit='KiB'> <distances> <sibling id='0' value='21'/> <sibling id='1' value='10'/> <sibling id='2' value='31'/> <sibling id='3' value='41'/> </distances> </cell> <cell id='2' cpus='2' memory='2097152' unit='KiB'> <distances> <sibling id='0' value='31'/> <sibling id='1' value='21'/> <sibling id='2' value='10'/> <sibling id='3' value='21'/> </distances> <cell id='3' cpus='3' memory='2097152' unit='KiB'> <distances> <sibling id='0' value='41'/> <sibling id='1' value='31'/> <sibling id='2' value='21'/> <sibling id='3' value='10'/> </distances> </cell> </numa> </cpu> Whenever a sibling id the cell LOCAL_DISTANCE does apply and for any sibling id not being covered a default of REMOTE_DISTANCE is used for internal computations. Signed-off-by: Wim ten Have <wim.ten.have@xxxxxxxxxx> --- Changes on v1: - Add changes to docs/formatdomain.html.in describing schema update. Changes on v2: - Automatically apply distance symmetry maintaining cell <-> sibling. - Check for maximum '255' on numaDistanceValue. - Automatically complete empty distance ranges. - Check that sibling_id's are in range with cell identifiers. - Allow non-contiguous ranges, starting from any node id. - Respect parameters as ATTRIBUTE_NONNULL fix functions and callers. - Add and apply topology for LOCAL_DISTANCE=10 and REMOTE_DISTANCE=20. Changes on v3 - Add UNREACHABLE if one locality is unreachable from another. - Add code cleanup aligning function naming in a separated patch. - Add numa related driver code in a separated patch. - Remove <choice> from numaDistanceValue schema/basictypes.rng - Correct doc changes. --- docs/formatdomain.html.in | 63 +++++++++++++- docs/schemas/basictypes.rng | 7 ++ docs/schemas/cputypes.rng | 18 ++++ src/conf/numa_conf.c | 200 +++++++++++++++++++++++++++++++++++++++++++- 4 files changed, 284 insertions(+), 4 deletions(-) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 8ca7637..c453d44 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1529,7 +1529,68 @@ </p> <p> - This guest NUMA specification is currently available only for QEMU/KVM. + This guest NUMA specification is currently available only for + QEMU/KVM and Xen. Whereas Xen driver also allows for a distinct + description of NUMA arranged <code>sibling</code> <code>cell</code> + <code>distances</code> <span class="since">Since 3.6.0</span>. + </p> + + <p> + Under NUMA h/w architecture, distinct resources such as memory + create a designated distance between <code>cell</code> and + <code>siblings</code> that now can be described with the help of + <code>distances</code>. A detailed description can be found within + the ACPI (Advanced Configuration and Power Interface Specification) + within the chapter explaining the system's SLIT (System Locality + Distance Information Table). + </p> + +<pre> +... +<cpu> + ... + <numa> + <cell id='0' cpus='0,4-7' memory='512000' unit='KiB'> + <distances> + <sibling id='0' value='10'/> + <sibling id='1' value='21'/> + <sibling id='2' value='31'/> + <sibling id='3' value='41'/> + </distances> + </cell> + <cell id='1' cpus='1,8-10,12-15' memory='512000' unit='KiB' memAccess='shared'> + <distances> + <sibling id='0' value='21'/> + <sibling id='1' value='10'/> + <sibling id='2' value='21'/> + <sibling id='3' value='31'/> + </distances> + </cell> + <cell id='2' cpus='2,11' memory='512000' unit='KiB' memAccess='shared'> + <distances> + <sibling id='0' value='31'/> + <sibling id='1' value='21'/> + <sibling id='2' value='10'/> + <sibling id='3' value='21'/> + </distances> + </cell> + <cell id='3' cpus='3' memory='512000' unit='KiB'> + <distances> + <sibling id='0' value='41'/> + <sibling id='1' value='31'/> + <sibling id='2' value='21'/> + <sibling id='3' value='10'/> + </distances> + </cell> + </numa> + ... +</cpu> +...</pre> + + <p> + Under Xen driver, if no <code>distances</code> are given to describe + the SLIT data between different cells, it will default to a scheme + using 10 for local and 20 for remote distances. </p> <h3><a id="elementsEvents">Events configuration</a></h3> diff --git a/docs/schemas/basictypes.rng b/docs/schemas/basictypes.rng index 1ea667c..1a18cd3 100644 --- a/docs/schemas/basictypes.rng +++ b/docs/schemas/basictypes.rng @@ -77,6 +77,13 @@ </choice> </define> + <define name="numaDistanceValue"> + <data type="unsignedInt"> + <param name="minInclusive">10</param> + <param name="maxInclusive">255</param> + </data> + </define> + <define name="pciaddress"> <optional> <attribute name="domain"> diff --git a/docs/schemas/cputypes.rng b/docs/schemas/cputypes.rng index 3eef16a..c45b6df 100644 --- a/docs/schemas/cputypes.rng +++ b/docs/schemas/cputypes.rng @@ -129,6 +129,24 @@ </choice> </attribute> </optional> + <optional> + <element name="distances"> + <oneOrMore> + <ref name="numaDistance"/> + </oneOrMore> + </element> + </optional> + </element> + </define> + + <define name="numaDistance"> + <element name="sibling"> + <attribute name="id"> + <ref name="unsignedInt"/> + </attribute> + <attribute name="value"> + <ref name="numaDistanceValue"/> + </attribute> </element> </define> diff --git a/src/conf/numa_conf.c b/src/conf/numa_conf.c index b71dc01..5db4311 100644 --- a/src/conf/numa_conf.c +++ b/src/conf/numa_conf.c @@ -29,6 +29,15 @@ #include "virnuma.h" #include "virstring.h" +/* + * Distance definitions defined Conform ACPI 2.0 SLIT. + * See include/linux/topology.h + */ +#define LOCAL_DISTANCE 10 +#define REMOTE_DISTANCE 20 +/* SLIT entry value is a one-byte unsigned integer. */ +#define UNREACHABLE 255 + #define VIR_FROM_THIS VIR_FROM_DOMAIN VIR_ENUM_IMPL(virDomainNumatuneMemMode, @@ -48,6 +57,8 @@ VIR_ENUM_IMPL(virDomainMemoryAccess, VIR_DOMAIN_MEMORY_ACCESS_LAST, "shared", "private") +typedef struct _virDomainNumaDistance virDomainNumaDistance; +typedef virDomainNumaDistance *virDomainNumaDistancePtr; typedef struct _virDomainNumaNode virDomainNumaNode; typedef virDomainNumaNode *virDomainNumaNodePtr; @@ -66,6 +77,12 @@ struct _virDomainNuma { virBitmapPtr nodeset; /* host memory nodes where this guest node resides */ virDomainNumatuneMemMode mode; /* memory mode selection */ virDomainMemoryAccess memAccess; /* shared memory access configuration */ + + struct _virDomainNumaDistance { + unsigned int value; /* locality value for node i->j or j->i */ + unsigned int cellid; + } *distances; /* remote node distances */ + size_t ndistances; } *mem_nodes; /* guest node configuration */ size_t nmem_nodes; @@ -686,6 +703,153 @@ virDomainNumatuneNodesetIsAvailable(virDomainNumaPtr numatune, } +static int +virDomainNumaDefNodeDistanceParseXML(virDomainNumaPtr def, + xmlXPathContextPtr ctxt, + unsigned int cur_cell) +{ + int ret = -1; + int sibling; + char *tmp = NULL; + xmlNodePtr *nodes = NULL; + size_t i, ndistances = def->nmem_nodes; + + if (!ndistances) + return 0; + + /* check if NUMA distances definition is present */ + if (!virXPathNode("./distances[1]", ctxt)) + return 0; + + if ((sibling = virXPathNodeSet("./distances[1]/sibling", ctxt, &nodes)) <= 0) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("NUMA distances defined without siblings")); + goto cleanup; + } + + for (i = 0; i < sibling; i++) { + virDomainNumaDistancePtr ldist, rdist; + unsigned int sibling_id, sibling_value; + + /* siblings are in order of parsing or explicitly numbered */ + if (!(tmp = virXMLPropString(nodes[i], "id"))) { + virReportError(VIR_ERR_XML_ERROR, + _("Missing 'id' attribute in NUMA " + "distances under 'cell id %d'"), + cur_cell); + goto cleanup; + } + + /* The "id" needs to be applicable */ + if (virStrToLong_uip(tmp, NULL, 10, &sibling_id) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("Invalid 'id' attribute in NUMA " + "distances for sibling: '%s'"), + tmp); + goto cleanup; + } + VIR_FREE(tmp); + + /* The "id" needs to be within numa/cell range */ + if (sibling_id >= ndistances) { + virReportError(VIR_ERR_XML_ERROR, + _("There is no cell administrated matching " + "'sibling_id %d' under NUMA 'cell id %d' "), + sibling_id, cur_cell); + goto cleanup; + } + + /* We need a locality value. Check and correct + * distance to local and distance to remote node. + */ + if (!(tmp = virXMLPropString(nodes[i], "value"))) { + virReportError(VIR_ERR_XML_ERROR, + _("Missing 'value' attribute in NUMA distances " + "under 'cell id %d' for 'sibling id %d'"), + cur_cell, sibling_id); + goto cleanup; + } + + /* The "value" needs to be applicable */ + if (virStrToLong_uip(tmp, NULL, 10, &sibling_value) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("Invalid 'value' attribute in NUMA " + "distances for value: '%s'"), + tmp); + goto cleanup; + } + VIR_FREE(tmp); + + /* LOCAL_DISTANCE <= "value" <= UNREACHABLE */ + if (sibling_value < LOCAL_DISTANCE || + sibling_value > UNREACHABLE) { + virReportError(VIR_ERR_XML_ERROR, + _("Out of range value '%d' set for " + "'sibling id %d' under NUMA 'cell id %d' "), + sibling_value, sibling_id, cur_cell); + goto cleanup; + } + + ldist = def->mem_nodes[cur_cell].distances; + if (!ldist) { + if (def->mem_nodes[cur_cell].ndistances) { + virReportError(VIR_ERR_XML_ERROR, + _("Invalid 'ndistances' set in NUMA " + "distances for sibling id: '%d'"), + cur_cell); + goto cleanup; + } + + if (VIR_ALLOC_N(ldist, ndistances) < 0) + goto cleanup; + + if (!ldist[cur_cell].value) + ldist[cur_cell].value = LOCAL_DISTANCE; + ldist[cur_cell].cellid = cur_cell; + def->mem_nodes[cur_cell].ndistances = ndistances; + } + + ldist[sibling_id].cellid = sibling_id; + ldist[sibling_id].value = sibling_value; + def->mem_nodes[cur_cell].distances = ldist; + + rdist = def->mem_nodes[sibling_id].distances; + if (!rdist) { + if (def->mem_nodes[sibling_id].ndistances) { + virReportError(VIR_ERR_XML_ERROR, + _("Invalid 'ndistances' set in NUMA " + "distances for sibling id: '%d'"), + sibling_id); + goto cleanup; + } + + if (VIR_ALLOC_N(rdist, ndistances) < 0) + goto cleanup; + + if (!rdist[sibling_id].value) + rdist[sibling_id].value = LOCAL_DISTANCE; + rdist[sibling_id].cellid = sibling_id; + def->mem_nodes[sibling_id].ndistances = ndistances; + } + + rdist[cur_cell].cellid = cur_cell; + rdist[cur_cell].value = sibling_value; + def->mem_nodes[sibling_id].distances = rdist; + } + + ret = 0; + + cleanup: + if (ret) { + for (i = 0; i < ndistances; i++) + VIR_FREE(def->mem_nodes[i].distances); + } + VIR_FREE(nodes); + VIR_FREE(tmp); + + return ret; +} + int virDomainNumaDefCPUParseXML(virDomainNumaPtr def, xmlXPathContextPtr ctxt) @@ -694,7 +858,7 @@ virDomainNumaDefCPUParseXML(virDomainNumaPtr def, xmlNodePtr oldNode = ctxt->node; char *tmp = NULL; int n; - size_t i; + size_t i, j; int ret = -1; /* check if NUMA definition is present */ @@ -712,7 +876,6 @@ virDomainNumaDefCPUParseXML(virDomainNumaPtr def, def->nmem_nodes = n; for (i = 0; i < n; i++) { - size_t j; int rc; unsigned int cur_cell = i; @@ -788,6 +951,10 @@ virDomainNumaDefCPUParseXML(virDomainNumaPtr def, def->mem_nodes[cur_cell].memAccess = rc; VIR_FREE(tmp); } + + /* Parse NUMA distances info */ + if (virDomainNumaDefNodeDistanceParseXML(def, ctxt, cur_cell) < 0) + goto cleanup; } ret = 0; @@ -815,6 +982,8 @@ virDomainNumaDefCPUFormatXML(virBufferPtr buf, virBufferAddLit(buf, "<numa>\n"); virBufferAdjustIndent(buf, 2); for (i = 0; i < ncells; i++) { + int ndistances; + memAccess = virDomainNumaGetNodeMemoryAccessMode(def, i); if (!(cpustr = virBitmapFormat(virDomainNumaGetNodeCpumask(def, i)))) @@ -829,7 +998,32 @@ virDomainNumaDefCPUFormatXML(virBufferPtr buf, if (memAccess) virBufferAsprintf(buf, " memAccess='%s'", virDomainMemoryAccessTypeToString(memAccess)); - virBufferAddLit(buf, "/>\n"); + + ndistances = def->mem_nodes[i].ndistances; + if (!ndistances) { + virBufferAddLit(buf, "/>\n"); + } else { + size_t j; + virDomainNumaDistancePtr distances = def->mem_nodes[i].distances; + + virBufferAddLit(buf, ">\n"); + virBufferAdjustIndent(buf, 2); + virBufferAddLit(buf, "<distances>\n"); + virBufferAdjustIndent(buf, 2); + for (j = 0; j < ndistances; j++) { + if (distances[j].value) { + virBufferAddLit(buf, "<sibling"); + virBufferAsprintf(buf, " id='%d'", distances[j].cellid); + virBufferAsprintf(buf, " value='%d'", distances[j].value); + virBufferAddLit(buf, "/>\n"); + } + } + virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</distances>\n"); + virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</cell>\n"); + } + VIR_FREE(cpustr); } virBufferAdjustIndent(buf, -2); -- 2.9.5 -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list