Re: Semantics of "-cpu host" (was Re: [Qemu-devel] [PATCH 2/2] Expose tsc deadline timer cpuid to guest)

Eduardo Habkost <ehabkost@xxxxxxxxxx> · Wed, 9 May 2012 16:38:02 -0300

On Wed, May 09, 2012 at 12:38:37PM +0300, Gleb Natapov wrote:
> On Wed, May 09, 2012 at 11:05:58AM +0200, Alexander Graf wrote:
> > 
> > On 09.05.2012, at 10:51, Gleb Natapov wrote:
> > 
> > > On Wed, May 09, 2012 at 10:42:26AM +0200, Alexander Graf wrote:
> > >> 
> > >> 
> > >> On 09.05.2012, at 10:14, Gleb Natapov <gleb@xxxxxxxxxx> wrote:
> > >> 
> > >>> On Wed, May 09, 2012 at 12:07:04AM +0200, Alexander Graf wrote:
> > >>>> 
> > >>>> On 08.05.2012, at 22:14, Eduardo Habkost wrote:
> > >>>> 
> > >>>>> On Tue, May 08, 2012 at 02:58:11AM +0200, Alexander Graf wrote:
> > >>>>>> On 07.05.2012, at 20:21, Eduardo Habkost wrote:
> > >>>>>> 
> > >>>>>>> 
> > >>>>>>> Andre? Are you able to help to answer the question below?
> > >>>>>>> 
> > >>>>>>> I would like to clarify what's the expected behavior of "-cpu host" to
> > >>>>>>> be able to continue working on it. I believe the code will need to be
> > >>>>>>> fixed on either case, but first we need to figure out what are the
> > >>>>>>> expectations/requirements, to know _which_ changes will be needed.
> > >>>>>>> 
> > >>>>>>> 
> > >>>>>>> On Tue, Apr 24, 2012 at 02:19:25PM -0300, Eduardo Habkost wrote:
> > >>>>>>>> (CCing Andre Przywara, in case he can help to clarify what's the
> > >>>>>>>> expected meaning of "-cpu host")
> > >>>>>>>> 
> > >>>>>>> [...]
> > >>>>>>>> I am not sure I understand what you are proposing. Let me explain the
> > >>>>>>>> use case I am thinking about:
> > >>>>>>>> 
> > >>>>>>>> - Feature FOO is of type (A) (e.g. just a new instruction set that
> > >>>>>>>> doesn't require additional userspace support)
> > >>>>>>>> - User has a Qemu vesion that doesn't know anything about feature FOO
> > >>>>>>>> - User gets a new CPU that supports feature FOO
> > >>>>>>>> - User gets a new kernel that supports feature FOO (i.e. has FOO in
> > >>>>>>>> GET_SUPPORTED_CPUID)
> > >>>>>>>> - User does _not_ upgrade Qemu.
> > >>>>>>>> - User expects to get feature FOO enabled if using "-cpu host", without
> > >>>>>>>> upgrading Qemu.
> > >>>>>>>> 
> > >>>>>>>> The problem here is: to support the above use-case, userspace need a
> > >>>>>>>> probing mechanism that can differentiate _new_ (previously unknown)
> > >>>>>>>> features that are in group (A) (safe to blindly enable) from features
> > >>>>>>>> that are in group (B) (that can't be enabled without an userspace
> > >>>>>>>> upgrade).
> > >>>>>>>> 
> > >>>>>>>> In short, it becomes a problem if we consider the following case:
> > >>>>>>>> 
> > >>>>>>>> - Feature BAR is of type (B) (it can't be enabled without extra
> > >>>>>>>> userspace support)
> > >>>>>>>> - User has a Qemu version that doesn't know anything about feature BAR
> > >>>>>>>> - User gets a new CPU that supports feature BAR
> > >>>>>>>> - User gets a new kernel that supports feature BAR (i.e. has BAR in
> > >>>>>>>> GET_SUPPORTED_CPUID)
> > >>>>>>>> - User does _not_ upgrade Qemu.
> > >>>>>>>> - User simply shouldn't get feature BAR enabled, even if using "-cpu
> > >>>>>>>> host", otherwise Qemu would break.
> > >>>>>>>> 
> > >>>>>>>> If userspace always limited itself to features it knows about, it would
> > >>>>>>>> be really easy to implement the feature without any new probing
> > >>>>>>>> mechanism from the kernel. But that's not how I think users expect "-cpu
> > >>>>>>>> host" to work. Maybe I am wrong, I don't know. I am CCing Andre, who
> > >>>>>>>> introduced the "-cpu host" feature, in case he can explain what's the
> > >>>>>>>> expected semantics on the cases above.
> > >>>>>> 
> > >>>>>> Can you think of any feature that'd go into category B?
> > >>>>> 
> > >>>>> - TSC-deadline: can't be enabled unless userspace takes care to enable
> > >>>>> the in-kernel irqchip.
> > >>>> 
> > >>>> The kernel can check if in-kernel irqchip has it enabled and otherwise mask it out, no?
> > >>>> 
> > >>> How kernel should know that userspace does not emulate it?
> > >> 
> > >> You have to enable the in-kernel apic to use it, at which point the kernel knows it's in use, right?
> > >> 
> > >>> 
> > >>>>> - x2apic: ditto.
> > >>>> 
> > >>>> Same here. For user space irqchip the kernel side doesn't care. If in-kernel APIC is enabled, check for its capabilities.
> > >>>> 
> > >>> Same here.
> > >>> 
> > >>> Well, technically both of those features can't be implemented in
> > >>> userspace right now since MSRs are terminated in the kernel, but I
> > >> 
> > >> Doesn't sound like the greatest design - unless you deprecate the non-in-kernel apic case.
> > >> 
> > > You mean terminating MSRs in kernel does not sound like the greatest
> > > design? I do not disagree. That is why IMO kernel can't filter out
> > > TSC-deadline and x2apic like you suggest.
> > 
> > I still don't see why it can't.
> > 
> > Imagine we would filter TSC-deadline and x2apic by default in the kernel - they are not known to exist yet.
> > Now, we implement TSC-deadline in the kernel. We still filter
> > TSC-deadline out in GET_SUPORTED_CPUID in the kernel. But we provide
> > an interface to user space that says "call me to enable TSC-deadline
> > CPUID, but only if you're using the in-kernel apic"

We have that interface already, it is called KVM_SET_CPUID.  :-)

> > New user space calls that ioctl when it's using the in-kernel apic, it doesn't when it's using the user space apic.
> > Old user space doesn't call that ioctl.
> First of all we already have TSC-deadline in GET_SUPORTED_CPUID without
> any additional ioctls. And second I do not see why we need additional
> iostls here.

We don't have TSC-deadline set today (and that's what started this
thread), but we have x2apic. So what you say is true for x2apic, anyway.

> Hmm, so may be I misunderstood you. You propose to mask TSC-deadline and
> x2apic out from GET_SUPORTED_CPUID if irq chip is not in kernel, not
> from KVM_SET_CPUID? For those two features it may make sense indeed.

It makes sense to me.

It looks like my assumptions were wrong. They were:

- GET_SUPPORTED_CPUID simply can't know if the in-kernel irqchip is
  going to be enabled or not.
- GET_SUPPORTED_CPUID output has to be a function of the kernel code
  capabilitie and host CPU, and not depend on any input from userspace.

Are those assumptions incorrect? If we break them, we may try what
Alexander is proposing. It would be much more flexible than the options
I was considering.

I didn't know ENABLE_CAP existed. Even if GET_SUPPORTED_CPUID can't
check for the in-kernel irqchip setup for some reason, ENABLE_CAP could
be used by userpace to tell the kernel "I will enable the in-kernel
irqchip, so feel free to return features that depend on it on
GET_SUPPORTED_CPUID".

In other words, we would return only the type-A features on
GET_SUPPORTED_CPUID (i.e. safe to be blindly enabled by -cpu host as
long as migration is not required), but if we use ENABLE_CAP we can make
group A safely grow, as long as userspace first tells the kernel what it
supports.

Anybody is against doing that? Otherwise I plan to work on this.
Probably I will start by making GET_SUPPORTED_CPUID not return x2apic
unless userspace tells the kernel (using ENABLE_CAP) that it will enable
the in-kernel irqchip. Then we can do the same with TSC-deadline.

Note that all this work is to allow the kernel to let userspace blindly
enable features it _doesn't know yet_. If we limit ourselves to features
userspace already knows about, we could simply remove x2apic and
TSC-deadline from GET_SUPPORTED_CPUID completely, not use ENABLE_CAP for
that, and let userspace set x2apic or TSC-deadline on KVM_SET_CPUID only
if it knows it is safe (either because it checked for the corresponding
KVM_CAP_* capability is present and it will enable the in-kernel
irqchip, or because it will emulated it in userspace).

In case anybody is against the proposal above: note that the current
documented GET_SUPPORTED_CPUID semantics (unconditionally returning bits
that depend on specific userspace behavior/capabilities) is simply
unusable by "-cpu host". If the above proposal gets rejected, my Plan B
is to update the GET_SUPPORTED_CPUID documentation to note that it
returns only type-A features (features that userspace can safely enable
even if it doesn't know what it does), remove x2apic from
GET_SUPPORTED_CPUID forever, and use KVM_CAP_* for discovery of all
type-B features (features that depend on specific userspace
capabilities/behavior).

> Not
> sure there won't be others that are not dependent on irq chip presence.
> You propose to add additional ioctls to enable them if they appear?

I am sure there will be new features in the future that don't depend on
any userspace support, so they would be enabled on GET_SUPPORTED_CPUID
unconditionally.

But if we have new features that depend on specific userspace
capabilities/behavior (i.e. enabling the irqchip, or something else), we
could also add them as long as we check if that capability/behavior was
enabled using ENABLE_CAP.

> > 
> > So at the end all bits in GET_SUPPORTED_CPUID are consistent with what user space is capable of.
> > 
> GET_SUPPORTED_CPUID should not be necessary consistent with what user
> space is capable of. Userspace may emulate features that are not in
> GET_SUPPORTED_CPUID.

True. We don't need to make the interface too complex just to make
GET_SUPPORTED_CPUID match exactly what userspace is going to enable. If
userspace wants to enable a feature because it can emulate it by its
own, it can just enable it using SET_CPUID.

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html