On 05/10/2012 02:53 PM, Gleb Natapov wrote:
On Wed, May 09, 2012 at 04:38:02PM -0300, Eduardo Habkost wrote:
On Wed, May 09, 2012 at 12:38:37PM +0300, Gleb Natapov wrote:
On Wed, May 09, 2012 at 11:05:58AM +0200, Alexander Graf wrote:
On 09.05.2012, at 10:51, Gleb Natapov wrote:
On Wed, May 09, 2012 at 10:42:26AM +0200, Alexander Graf wrote:
On 09.05.2012, at 10:14, Gleb Natapov<gleb@xxxxxxxxxx> wrote:
On Wed, May 09, 2012 at 12:07:04AM +0200, Alexander Graf wrote:
On 08.05.2012, at 22:14, Eduardo Habkost wrote:
On Tue, May 08, 2012 at 02:58:11AM +0200, Alexander Graf wrote:
On 07.05.2012, at 20:21, Eduardo Habkost wrote:
Andre? Are you able to help to answer the question below?
I would like to clarify what's the expected behavior of "-cpu host" to
be able to continue working on it. I believe the code will need to be
fixed on either case, but first we need to figure out what are the
expectations/requirements, to know _which_ changes will be needed.
On Tue, Apr 24, 2012 at 02:19:25PM -0300, Eduardo Habkost wrote:
(CCing Andre Przywara, in case he can help to clarify what's the
expected meaning of "-cpu host")
[...]
I am not sure I understand what you are proposing. Let me explain the
use case I am thinking about:
- Feature FOO is of type (A) (e.g. just a new instruction set that
doesn't require additional userspace support)
- User has a Qemu vesion that doesn't know anything about feature FOO
- User gets a new CPU that supports feature FOO
- User gets a new kernel that supports feature FOO (i.e. has FOO in
GET_SUPPORTED_CPUID)
- User does _not_ upgrade Qemu.
- User expects to get feature FOO enabled if using "-cpu host", without
upgrading Qemu.
The problem here is: to support the above use-case, userspace need a
probing mechanism that can differentiate _new_ (previously unknown)
features that are in group (A) (safe to blindly enable) from features
that are in group (B) (that can't be enabled without an userspace
upgrade).
In short, it becomes a problem if we consider the following case:
- Feature BAR is of type (B) (it can't be enabled without extra
userspace support)
- User has a Qemu version that doesn't know anything about feature BAR
- User gets a new CPU that supports feature BAR
- User gets a new kernel that supports feature BAR (i.e. has BAR in
GET_SUPPORTED_CPUID)
- User does _not_ upgrade Qemu.
- User simply shouldn't get feature BAR enabled, even if using "-cpu
host", otherwise Qemu would break.
If userspace always limited itself to features it knows about, it would
be really easy to implement the feature without any new probing
mechanism from the kernel. But that's not how I think users expect "-cpu
host" to work. Maybe I am wrong, I don't know. I am CCing Andre, who
introduced the "-cpu host" feature, in case he can explain what's the
expected semantics on the cases above.
Can you think of any feature that'd go into category B?
- TSC-deadline: can't be enabled unless userspace takes care to enable
the in-kernel irqchip.
The kernel can check if in-kernel irqchip has it enabled and otherwise mask it out, no?
How kernel should know that userspace does not emulate it?
You have to enable the in-kernel apic to use it, at which point the kernel knows it's in use, right?
- x2apic: ditto.
Same here. For user space irqchip the kernel side doesn't care. If in-kernel APIC is enabled, check for its capabilities.
Same here.
Well, technically both of those features can't be implemented in
userspace right now since MSRs are terminated in the kernel, but I
Doesn't sound like the greatest design - unless you deprecate the non-in-kernel apic case.
You mean terminating MSRs in kernel does not sound like the greatest
design? I do not disagree. That is why IMO kernel can't filter out
TSC-deadline and x2apic like you suggest.
I still don't see why it can't.
Imagine we would filter TSC-deadline and x2apic by default in the kernel - they are not known to exist yet.
Now, we implement TSC-deadline in the kernel. We still filter
TSC-deadline out in GET_SUPORTED_CPUID in the kernel. But we provide
an interface to user space that says "call me to enable TSC-deadline
CPUID, but only if you're using the in-kernel apic"
We have that interface already, it is called KVM_SET_CPUID. :-)
New user space calls that ioctl when it's using the in-kernel apic, it doesn't when it's using the user space apic.
Old user space doesn't call that ioctl.
First of all we already have TSC-deadline in GET_SUPORTED_CPUID without
any additional ioctls. And second I do not see why we need additional
iostls here.
We don't have TSC-deadline set today (and that's what started this
thread), but we have x2apic. So what you say is true for x2apic, anyway.
Hmm, strange. But s/TSC-deadline/x2apic/ and the point stands :)
Hmm, so may be I misunderstood you. You propose to mask TSC-deadline and
x2apic out from GET_SUPORTED_CPUID if irq chip is not in kernel, not
from KVM_SET_CPUID? For those two features it may make sense indeed.
It makes sense to me.
It looks like my assumptions were wrong. They were:
- GET_SUPPORTED_CPUID simply can't know if the in-kernel irqchip is
going to be enabled or not.
I think this is currently true. Before we changing that we have to make
sure that no existing userspace calls GET_SUPPORTED_CPUID before
creating irqchip. Not sure we can check all existing userspaces.
- GET_SUPPORTED_CPUID output has to be a function of the kernel code
capabilitie and host CPU, and not depend on any input from userspace.
Are those assumptions incorrect? If we break them, we may try what
Alexander is proposing. It would be much more flexible than the options
I was considering.
I didn't know ENABLE_CAP existed. Even if GET_SUPPORTED_CPUID can't
What is ENABLE_CAP?
It's an ioctl that you pass in a CAP. It's used on non-x86 already to
easily turn on features inside kvm :).
check for the in-kernel irqchip setup for some reason, ENABLE_CAP could
be used by userpace to tell the kernel "I will enable the in-kernel
irqchip, so feel free to return features that depend on it on
GET_SUPPORTED_CPUID".
In other words, we would return only the type-A features on
GET_SUPPORTED_CPUID (i.e. safe to be blindly enabled by -cpu host as
long as migration is not required), but if we use ENABLE_CAP we can make
group A safely grow, as long as userspace first tells the kernel what it
supports.
Anybody is against doing that? Otherwise I plan to work on this.
Probably I will start by making GET_SUPPORTED_CPUID not return x2apic
unless userspace tells the kernel (using ENABLE_CAP) that it will enable
the in-kernel irqchip. Then we can do the same with TSC-deadline.
Note that all this work is to allow the kernel to let userspace blindly
enable features it _doesn't know yet_. If we limit ourselves to features
userspace already knows about, we could simply remove x2apic and
TSC-deadline from GET_SUPPORTED_CPUID completely, not use ENABLE_CAP for
that, and let userspace set x2apic or TSC-deadline on KVM_SET_CPUID only
if it knows it is safe (either because it checked for the corresponding
KVM_CAP_* capability is present and it will enable the in-kernel
irqchip, or because it will emulated it in userspace).
There is not KVM_CAP_* for x2apic as far as I see.
In case anybody is against the proposal above: note that the current
documented GET_SUPPORTED_CPUID semantics (unconditionally returning bits
that depend on specific userspace behavior/capabilities) is simply
unusable by "-cpu host". If the above proposal gets rejected, my Plan B
is to update the GET_SUPPORTED_CPUID documentation to note that it
returns only type-A features (features that userspace can safely enable
even if it doesn't know what it does), remove x2apic from
GET_SUPPORTED_CPUID forever, and use KVM_CAP_* for discovery of all
type-B features (features that depend on specific userspace
capabilities/behavior).
This will break existing userspaces, no?
It would mean that new user space on old kernels don't get x2apic or
tsc-deadline in cpuid, yes.
Not
sure there won't be others that are not dependent on irq chip presence.
You propose to add additional ioctls to enable them if they appear?
I am sure there will be new features in the future that don't depend on
any userspace support, so they would be enabled on GET_SUPPORTED_CPUID
unconditionally.
Those not the kind of features I am worry about though. I worry about
the features that can be emulated either by kernel or by userspace and
userspace may choose where it wants the feature to be emulated.
Like? Usually CPUID features reflect new instructions. We don't notify
user space on unknown instructions, right?
Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html