Re: Semantics of "-cpu host" (was Re: [Qemu-devel] [PATCH 2/2] Expose tsc deadline timer cpuid to guest)

Alexander Graf <agraf@xxxxxxx> · Wed, 9 May 2012 22:30:25 +0200

On 09.05.2012, at 21:38, Eduardo Habkost wrote:

> On Wed, May 09, 2012 at 12:38:37PM +0300, Gleb Natapov wrote:
>> On Wed, May 09, 2012 at 11:05:58AM +0200, Alexander Graf wrote:
>>> 
>>> On 09.05.2012, at 10:51, Gleb Natapov wrote:
>>> 
>>>> On Wed, May 09, 2012 at 10:42:26AM +0200, Alexander Graf wrote:
>>>>> 
>>>>> 
>>>>> On 09.05.2012, at 10:14, Gleb Natapov <gleb@xxxxxxxxxx> wrote:
>>>>> 
>>>>>> On Wed, May 09, 2012 at 12:07:04AM +0200, Alexander Graf wrote:
>>>>>>> 
>>>>>>> On 08.05.2012, at 22:14, Eduardo Habkost wrote:
>>>>>>> 
>>>>>>>> On Tue, May 08, 2012 at 02:58:11AM +0200, Alexander Graf wrote:
>>>>>>>>> On 07.05.2012, at 20:21, Eduardo Habkost wrote:
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Andre? Are you able to help to answer the question below?
>>>>>>>>>> 
>>>>>>>>>> I would like to clarify what's the expected behavior of "-cpu host" to
>>>>>>>>>> be able to continue working on it. I believe the code will need to be
>>>>>>>>>> fixed on either case, but first we need to figure out what are the
>>>>>>>>>> expectations/requirements, to know _which_ changes will be needed.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Tue, Apr 24, 2012 at 02:19:25PM -0300, Eduardo Habkost wrote:
>>>>>>>>>>> (CCing Andre Przywara, in case he can help to clarify what's the
>>>>>>>>>>> expected meaning of "-cpu host")
>>>>>>>>>>> 
>>>>>>>>>> [...]
>>>>>>>>>>> I am not sure I understand what you are proposing. Let me explain the
>>>>>>>>>>> use case I am thinking about:
>>>>>>>>>>> 
>>>>>>>>>>> - Feature FOO is of type (A) (e.g. just a new instruction set that
>>>>>>>>>>> doesn't require additional userspace support)
>>>>>>>>>>> - User has a Qemu vesion that doesn't know anything about feature FOO
>>>>>>>>>>> - User gets a new CPU that supports feature FOO
>>>>>>>>>>> - User gets a new kernel that supports feature FOO (i.e. has FOO in
>>>>>>>>>>> GET_SUPPORTED_CPUID)
>>>>>>>>>>> - User does _not_ upgrade Qemu.
>>>>>>>>>>> - User expects to get feature FOO enabled if using "-cpu host", without
>>>>>>>>>>> upgrading Qemu.
>>>>>>>>>>> 
>>>>>>>>>>> The problem here is: to support the above use-case, userspace need a
>>>>>>>>>>> probing mechanism that can differentiate _new_ (previously unknown)
>>>>>>>>>>> features that are in group (A) (safe to blindly enable) from features
>>>>>>>>>>> that are in group (B) (that can't be enabled without an userspace
>>>>>>>>>>> upgrade).
>>>>>>>>>>> 
>>>>>>>>>>> In short, it becomes a problem if we consider the following case:
>>>>>>>>>>> 
>>>>>>>>>>> - Feature BAR is of type (B) (it can't be enabled without extra
>>>>>>>>>>> userspace support)
>>>>>>>>>>> - User has a Qemu version that doesn't know anything about feature BAR
>>>>>>>>>>> - User gets a new CPU that supports feature BAR
>>>>>>>>>>> - User gets a new kernel that supports feature BAR (i.e. has BAR in
>>>>>>>>>>> GET_SUPPORTED_CPUID)
>>>>>>>>>>> - User does _not_ upgrade Qemu.
>>>>>>>>>>> - User simply shouldn't get feature BAR enabled, even if using "-cpu
>>>>>>>>>>> host", otherwise Qemu would break.
>>>>>>>>>>> 
>>>>>>>>>>> If userspace always limited itself to features it knows about, it would
>>>>>>>>>>> be really easy to implement the feature without any new probing
>>>>>>>>>>> mechanism from the kernel. But that's not how I think users expect "-cpu
>>>>>>>>>>> host" to work. Maybe I am wrong, I don't know. I am CCing Andre, who
>>>>>>>>>>> introduced the "-cpu host" feature, in case he can explain what's the
>>>>>>>>>>> expected semantics on the cases above.
>>>>>>>>> 
>>>>>>>>> Can you think of any feature that'd go into category B?
>>>>>>>> 
>>>>>>>> - TSC-deadline: can't be enabled unless userspace takes care to enable
>>>>>>>> the in-kernel irqchip.
>>>>>>> 
>>>>>>> The kernel can check if in-kernel irqchip has it enabled and otherwise mask it out, no?
>>>>>>> 
>>>>>> How kernel should know that userspace does not emulate it?
>>>>> 
>>>>> You have to enable the in-kernel apic to use it, at which point the kernel knows it's in use, right?
>>>>> 
>>>>>> 
>>>>>>>> - x2apic: ditto.
>>>>>>> 
>>>>>>> Same here. For user space irqchip the kernel side doesn't care. If in-kernel APIC is enabled, check for its capabilities.
>>>>>>> 
>>>>>> Same here.
>>>>>> 
>>>>>> Well, technically both of those features can't be implemented in
>>>>>> userspace right now since MSRs are terminated in the kernel, but I
>>>>> 
>>>>> Doesn't sound like the greatest design - unless you deprecate the non-in-kernel apic case.
>>>>> 
>>>> You mean terminating MSRs in kernel does not sound like the greatest
>>>> design? I do not disagree. That is why IMO kernel can't filter out
>>>> TSC-deadline and x2apic like you suggest.
>>> 
>>> I still don't see why it can't.
>>> 
>>> Imagine we would filter TSC-deadline and x2apic by default in the kernel - they are not known to exist yet.
>>> Now, we implement TSC-deadline in the kernel. We still filter
>>> TSC-deadline out in GET_SUPORTED_CPUID in the kernel. But we provide
>>> an interface to user space that says "call me to enable TSC-deadline
>>> CPUID, but only if you're using the in-kernel apic"
> 
> We have that interface already, it is called KVM_SET_CPUID.  :-)
> 
>>> New user space calls that ioctl when it's using the in-kernel apic, it doesn't when it's using the user space apic.
>>> Old user space doesn't call that ioctl.
>> First of all we already have TSC-deadline in GET_SUPORTED_CPUID without
>> any additional ioctls. And second I do not see why we need additional
>> iostls here.
> 
> We don't have TSC-deadline set today (and that's what started this
> thread), but we have x2apic. So what you say is true for x2apic, anyway.
> 
>> Hmm, so may be I misunderstood you. You propose to mask TSC-deadline and
>> x2apic out from GET_SUPORTED_CPUID if irq chip is not in kernel, not
>> from KVM_SET_CPUID? For those two features it may make sense indeed.
> 
> It makes sense to me.
> 
> It looks like my assumptions were wrong. They were:
> 
> - GET_SUPPORTED_CPUID simply can't know if the in-kernel irqchip is
>  going to be enabled or not.
> - GET_SUPPORTED_CPUID output has to be a function of the kernel code
>  capabilitie and host CPU, and not depend on any input from userspace.
> 
> Are those assumptions incorrect? If we break them, we may try what
> Alexander is proposing. It would be much more flexible than the options
> I was considering.
> 
> I didn't know ENABLE_CAP existed. Even if GET_SUPPORTED_CPUID can't
> check for the in-kernel irqchip setup for some reason, ENABLE_CAP could
> be used by userpace to tell the kernel "I will enable the in-kernel
> irqchip, so feel free to return features that depend on it on
> GET_SUPPORTED_CPUID".
> 
> In other words, we would return only the type-A features on
> GET_SUPPORTED_CPUID (i.e. safe to be blindly enabled by -cpu host as
> long as migration is not required), but if we use ENABLE_CAP we can make
> group A safely grow, as long as userspace first tells the kernel what it
> supports.
> 
> Anybody is against doing that? Otherwise I plan to work on this.

I think it makes a lot of sense to go this route. But get your blessing from Avi first - ultimately he owns the API :).

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html