Re: [libvirt PATCH 10/11] domain_capabilities: Add blockers attribute for CPU models

Jiri Denemark <jdenemar@xxxxxxxxxx> · Wed, 5 Oct 2022 10:49:43 +0200

On Wed, Oct 05, 2022 at 09:07:55 +0100, Daniel P. Berrangé wrote:
> On Tue, Oct 04, 2022 at 10:17:18PM +0200, Jiri Denemark wrote:
> > > On Tue, Oct 04, 2022 at 07:35:31PM +0200, Jiri Denemark wrote:
> > > > On Tue, Oct 04, 2022 at 17:34:34 +0100, Daniel P. Berrangé wrote:
> > > > > On Tue, Oct 04, 2022 at 04:28:53PM +0200, Jiri Denemark wrote:
> > > > > > We already show whether a specific CPU model is usable on the current
> > > > > > host without modification via the 'usable' attribute of each CPU model.
> > > > > > But it may be useful to actually see what features are blocking each CPU
> > > > > > model from being usable. Especially when we already fetch the info from
> > > > > > QEMU and propagating it to domain capabilities XML is all we need to do.
> > > > > 
> > > > > > diff --git a/tests/domaincapsdata/qemu_4.2.0-q35.x86_64.xml b/tests/domaincapsdata/qemu_4.2.0-q35.x86_64.xml
> > > > > > index dab12e5888..8ca9e8d2b2 100644
> > > > > > --- a/tests/domaincapsdata/qemu_4.2.0-q35.x86_64.xml
> > > > > > +++ b/tests/domaincapsdata/qemu_4.2.0-q35.x86_64.xml
> > > > > > @@ -63,7 +63,7 @@
> > > > > >      <mode name='custom' supported='yes'>
> > > > > >        <model usable='yes' vendor='unknown'>qemu64</model>
> > > > > >        <model usable='yes' vendor='unknown'>qemu32</model>
> > > > > > -      <model usable='no' vendor='AMD'>phenom</model>
> > > > > > +      <model usable='no' vendor='AMD' blockers='mmxext,fxsr_opt,3dnowext,3dnow,sse4a,npt'>phenom</model>
> > > > > 
> > > > > This is an XML design anti-pattern, because it invents a data format
> > > > > inside the attribute which the caller then has to further parse.
> > > > > 
> > > > > If we want to expose this, it needs to be with child elements IMHO,
> > > > > but yes it will be more much more verbose.
> > > > 
> > > > You're absolutely right, but that's the only option we have I'm afraid.
> > > > Mixing subelements and text nodes is a much worse anti-pattern. I wish
> > > > the model name was in an attribute, but it isn't and having
> > > > 
> > > >     <model usable='no' vendor='AMD'>
> > > >       <blocker name='mmxext'/>
> > > >       phenom
> > > >     </model>
> > > > 
> > > > is just insane :-(
> > > 
> > > True, I wonder if there's a different approach to the overall problem
> > > that would be better.
> > 
> > Actually a third option just came to my mind. It's not ideal either, but
> > at least it would be a proper XML :-)
> > 
> >     <mode name='custom' supported='yes'>
> >       <model usable='yes' vendor='unknown'>qemu64</model>
> >       <model usable='no' vendor='AMD'>phenom</model>
> >       <blockers model='phenom'>
> >         <feature name='mmxext'/>
> >         <feature name='fxsr_opt'/>
> >         ...
> >       </blockers>
> >       <model ...>...</model>
> >       ...
> >     </mode>
> 
> Actually, looking atr this in practice, I don't think we should be
> including this information in domcapabilities at all. It gets
> waaaaaaaay too verbose, even with the custom syntax in this current
> patch impl. Take a look at this from one of my VMs, which uses the
> qemu64 model, and thus lacks a huge number of features:
...
> I think we need to expose this in a different way, using the CPU baseline
> APIs. THese already have a VIR_CPU_BASELINE_EXPAND_FEATURES flag. We
> should add a further VIR_CPU_BASELINE_BLOCKED_FEATURES flag to it.

Hmm, not a bad idea. And we don't even need a new flag, just a bit of
documentation and a bug fix. When you pass just a simple

    <cpu>
      <arch>x86_64</arch>
      <model>EPYC</model>
      <vendor>AMD</vendor>
    </cpu>

CPU definition to hypervisor-cpu-baseline, libvirt checks the host CPU
model for features included in EPYC CPU model and disables those that
are unavailable on the host:

    <cpu mode='custom' match='exact'>
      <model fallback='forbid'>EPYC</model>
      <vendor>AMD</vendor>
      <feature policy='disable' name='sha-ni'/>
      <feature policy='disable' name='mmxext'/>
      <feature policy='disable' name='cr8legacy'/>
      <feature policy='disable' name='sse4a'/>
      <feature policy='disable' name='misalignsse'/>
      <feature policy='disable' name='osvw'/>
      <feature policy='disable' name='monitor'/>
    </cpu>

Although it does not exactly match the list of blockers from QEMU:
sha-ni, mmxext, fxsr-opt, cr8legacy, sse4a, misalignsse, osvw.

Libvirt disables some features which are not present in QEMU's blockers
list (monitor), but this is fine as these features are included only in
libvirt's EPYC and QEMU would not enable them anyway. Explicitly
disabling them is a no-op for QEMU and helps pass libvirt internal
checks. Seeing a non-empty list of disabled features for a CPU model
marked as usable='yes' might just be a bit confusing. I guess just
documenting this should be enough.

On the other hand, the baseline CPU model does not disable some features
QEMU would need to disable (fxsr-opt in the example above) because they
are not included in the CPU model definition in libvirt. But this is a
bug and I'm a bit surprised to see it as I believe I addressed this
exact issue some time ago (although it's quite possible I'm just
thinking about similar issue somewhere else).

Jirka