On 05/07/2012 02:34 AM, Osier Yang wrote: > Though numad will manage the memory allocation of task dynamically, > but it wants management application (libvirt) to pre-set the memory > policy according to the advisory nodeset returned from querying numad, > (just like pre-bind CPU nodeset for domain process), and thus the > performance could benifit much more from it. s/benifit/benefit/ > > This patch introduces new XML tag 'placement', value 'auto' indicates > whether to set the memory policy with the advisory nodeset from numad, > and its value defaults to 'static'. e.g. > > <numatune> > <memory placement='auto' mode='interleave'/> > </numatune> > > Just like what current "numatune" does, the 'auto' numa memory policy > setting uses libnuma's API too. > > So, to full drive numad, one needs to specify placement='auto' for > both "<vcpu>" and "<numatune><memory .../></numatune>". It's a bit > inconvenient, but makes sense from semantics' point of view. > > --- > An alternative way is to not introduce the new XML tag, and pre-set > the memory policy implicitly with "<vcpu placement='auto'>4</vcpu>", > but IMHO it implies too much, and I'd not like go this way unless > the new XML tag is not accepted. I think we need a hybrid approach. Existing users wrote XML that used just <vcpu placement='auto'> but expecting full numad support. Normally, numad is most useful if it controls _both_ memory (<numatune>) and vcpu placement together, but it is feasible to control either one in isolation. Therefore, I think what we need to do is specify that on input, we have four situations: 1. /domain/vcpu@placement is missing /domain/numatune/memory@placement is missing We default to mode='static' in both places, and avoid numad 2. /domain/vcpu@placement is present /domain/numatune/memory@placement is missing We copy vcpu placement over to numa placement (and if placement='auto', that means we use numad for everything) 3. /domain/vcpu@placement is missing /domain/numatune/memory@placement is present We copy numa placement over to vcpu placement (and if placement='auto', that means we use numad for everything) 4. /domain/vcpu@placement is present /domain/numatune/memory@placement is present We use exactly what the user specifies (and if only one of the two features is placement='auto', then the other feature avoids numad) Mode 3 and 4 would be the new modes possible by the XML added in this patch. Mode 1 is the default for XML in use by guests before numad integration, and Mode 2 is the XML for guests attempting to use numad in the 0.9.11 release; I'm okay changing the semantics of mode 2 to be easier to use (because you can use mode 4 if you really wanted the unusual semantics of numad vcpu placement but not memory placement). Then on output, we always output mode in both <vcpu> and <numatune>, as in mode 4 (yeah, that probably means touching a lot of tests, but that's life when we add new XML). I'm still a bit torn on whether this must go in before 0.9.12, since we're past rc1. But since it looks like this is more of an oversight (our numad usage in 0.9.11 is not very useful, since it missed memory placement), this can be argued to be a bug fix rather than a new feature, even though it is adding XML, so I will probably be okay with a v2 patch going in before the final 0.9.12 release. On to the actual code of v1: > > diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in > index e1fe0c4..01b3124 100644 > --- a/docs/formatdomain.html.in > +++ b/docs/formatdomain.html.in > @@ -580,9 +580,14 @@ > The optional <code>memory</code> element specifies how to allocate memory > for the domain process on a NUMA host. It contains two attributes, s/two attributes,/several optional attributes./ > attribute <code>mode</code> is either 'interleave', 'strict', s/attribute/Attribute/ > - or 'preferred', > - attribute <code>nodeset</code> specifies the NUMA nodes, it leads same > - syntax with attribute <code>cpuset</code> of element <code>vcpu</code>. > + or 'preferred', attribute <code>nodeset</code> specifies the NUMA nodes, s/'preferred', attribute/'preferred'. Attribute/ > + it leads same syntax with attribute <code>cpuset</code> of element s/it leads same syntax with/using the same syntax as/ > + <code>vcpu</code>, the optional attribute <code>placement</code> can be s/</code>, the/</code>. The/ > + used to indicate the memory placement mode for domain process, its value > + can be either "static" or "auto", defaults to "static". "auto" indicates Given my above comments, this would default to the same placement as used in <vcpu>. > + the domain process will only allocate memory from the advisory nodeset > + returned from querying numad, and the value of attribute <code>nodeset</code> > + will be ignored if it's specified. > <span class='since'>Since 0.9.3</span> Mention that attribute placement is since 0.9.12. > </dd> > </dl> > diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng > index 8419ccc..9b509f1 100644 > --- a/docs/schemas/domaincommon.rng > +++ b/docs/schemas/domaincommon.rng > @@ -562,16 +562,35 @@ > <element name="numatune"> > <optional> > <element name="memory"> > - <attribute name="mode"> > - <choice> > - <value>strict</value> > - <value>preferred</value> > - <value>interleave</value> > - </choice> > - </attribute> > - <attribute name="nodeset"> > - <ref name="cpuset"/> > - </attribute> > + <choice> > + <group> > + <attribute name="mode"> > + <choice> > + <value>strict</value> > + <value>preferred</value> > + <value>interleave</value> > + </choice> > + </attribute> > + <attribute name='placement'> > + <choice> > + <value>static</value> > + <value>auto</value> > + </choice> > + </attribute> > + </group> > + <group> > + <attribute name="mode"> > + <choice> > + <value>strict</value> > + <value>preferred</value> > + <value>interleave</value> > + </choice> > + </attribute> > + <attribute name="nodeset"> > + <ref name="cpuset"/> > + </attribute> > + </group> > + </choice> It looks like you are requiring mode='...' in both choices, so the real choice is whether you permit placement or nodeset. However, this RNG will forbid placement='static' nodeset='0-3', so you didn't get it quite right. I almost think you want: <attribute name='mode'> <choice> <value>strict</value> <value>preferred</value> <value>interleave</value> </choice> </attribute> <choice> <group> <optional> <attribute name='placement'> <value>static</value> </attribute> </optional> <optional> <attribute name='nodeset'> <ref name='cpuset'/> </attribute> </optional> </group> <attribute name='placement'> <value>auto</value> </attribute> </choice> which says that both placement and nodeset can be missing (by my new semantics, that would mean that placement defaults to <vcpu>, otherwise to static); or that nodeset can be present (placement would be implied as static), or that placement can explicitly be set to static. Meanwhile, it says that placement='auto' cannot be mixed with nodeset='...' (is that right, or do we output nodeset even with numad automatic placement to show the user what they actually have at the current moment?). > +++ b/libvirt.spec.in > @@ -454,6 +454,7 @@ BuildRequires: scrub > > %if %{with_numad} > BuildRequires: numad > +BuildRequires: numactl-devel > %endif This should be split into an independent fix. Furthermore, it needs to be gated - in F16, 'numad' provided the development tools that were split off into 'numactl-devel' for F17. So this really needs to be a versioned check: %if %{with_numad} %if 0%{?fedora} >= 17 BuildRequires: numactl-devel %else BuildRequires: numad %endif > @@ -8055,13 +8052,40 @@ static virDomainDefPtr virDomainDefParseXML(virCapsPtr caps, > virDomainReportError(VIR_ERR_XML_ERROR, > _("Unsupported NUMA memory " > "tuning mode '%s'"), > - tmp); > + tmp); Spurious indentation change. The old spacing was correct. > + > + /* "nodeset" leads same syntax with "cpuset". */ s/leads same syntax with/uses the same syntax as/ > struct _virDomainTimerCatchupDef { > @@ -1504,6 +1511,7 @@ struct _virDomainNumatuneDef { > struct { > char *nodemask; > int mode; > + int placement_mode; Add a comment mentioning which enum values are valid for assignment to this variable. -- Eric Blake eblake@xxxxxxxxxx +1-919-301-3266 Libvirt virtualization library http://libvirt.org
Attachment:
signature.asc
Description: OpenPGP digital signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list