On Thu, May 10, 2012 at 03:21:41PM +0200, Alexander Graf wrote: > On 05/10/2012 02:53 PM, Gleb Natapov wrote: > >On Wed, May 09, 2012 at 04:38:02PM -0300, Eduardo Habkost wrote: > >>On Wed, May 09, 2012 at 12:38:37PM +0300, Gleb Natapov wrote: > >>>On Wed, May 09, 2012 at 11:05:58AM +0200, Alexander Graf wrote: > >>>>On 09.05.2012, at 10:51, Gleb Natapov wrote: > >>>> > >>>>>On Wed, May 09, 2012 at 10:42:26AM +0200, Alexander Graf wrote: > >>>>>> > >>>>>>On 09.05.2012, at 10:14, Gleb Natapov<gleb@xxxxxxxxxx> wrote: > >>>>>> > >>>>>>>On Wed, May 09, 2012 at 12:07:04AM +0200, Alexander Graf wrote: > >>>>>>>>On 08.05.2012, at 22:14, Eduardo Habkost wrote: > >>>>>>>> > >>>>>>>>>On Tue, May 08, 2012 at 02:58:11AM +0200, Alexander Graf wrote: > >>>>>>>>>>On 07.05.2012, at 20:21, Eduardo Habkost wrote: > >>>>>>>>>> > >>>>>>>>>>>Andre? Are you able to help to answer the question below? > >>>>>>>>>>> > >>>>>>>>>>>I would like to clarify what's the expected behavior of "-cpu host" to > >>>>>>>>>>>be able to continue working on it. I believe the code will need to be > >>>>>>>>>>>fixed on either case, but first we need to figure out what are the > >>>>>>>>>>>expectations/requirements, to know _which_ changes will be needed. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>On Tue, Apr 24, 2012 at 02:19:25PM -0300, Eduardo Habkost wrote: > >>>>>>>>>>>>(CCing Andre Przywara, in case he can help to clarify what's the > >>>>>>>>>>>>expected meaning of "-cpu host") > >>>>>>>>>>>> > >>>>>>>>>>>[...] > >>>>>>>>>>>>I am not sure I understand what you are proposing. Let me explain the > >>>>>>>>>>>>use case I am thinking about: > >>>>>>>>>>>> > >>>>>>>>>>>>- Feature FOO is of type (A) (e.g. just a new instruction set that > >>>>>>>>>>>>doesn't require additional userspace support) > >>>>>>>>>>>>- User has a Qemu vesion that doesn't know anything about feature FOO > >>>>>>>>>>>>- User gets a new CPU that supports feature FOO > >>>>>>>>>>>>- User gets a new kernel that supports feature FOO (i.e. has FOO in > >>>>>>>>>>>>GET_SUPPORTED_CPUID) > >>>>>>>>>>>>- User does _not_ upgrade Qemu. > >>>>>>>>>>>>- User expects to get feature FOO enabled if using "-cpu host", without > >>>>>>>>>>>>upgrading Qemu. > >>>>>>>>>>>> > >>>>>>>>>>>>The problem here is: to support the above use-case, userspace need a > >>>>>>>>>>>>probing mechanism that can differentiate _new_ (previously unknown) > >>>>>>>>>>>>features that are in group (A) (safe to blindly enable) from features > >>>>>>>>>>>>that are in group (B) (that can't be enabled without an userspace > >>>>>>>>>>>>upgrade). > >>>>>>>>>>>> > >>>>>>>>>>>>In short, it becomes a problem if we consider the following case: > >>>>>>>>>>>> > >>>>>>>>>>>>- Feature BAR is of type (B) (it can't be enabled without extra > >>>>>>>>>>>>userspace support) > >>>>>>>>>>>>- User has a Qemu version that doesn't know anything about feature BAR > >>>>>>>>>>>>- User gets a new CPU that supports feature BAR > >>>>>>>>>>>>- User gets a new kernel that supports feature BAR (i.e. has BAR in > >>>>>>>>>>>>GET_SUPPORTED_CPUID) > >>>>>>>>>>>>- User does _not_ upgrade Qemu. > >>>>>>>>>>>>- User simply shouldn't get feature BAR enabled, even if using "-cpu > >>>>>>>>>>>>host", otherwise Qemu would break. > >>>>>>>>>>>> > >>>>>>>>>>>>If userspace always limited itself to features it knows about, it would > >>>>>>>>>>>>be really easy to implement the feature without any new probing > >>>>>>>>>>>>mechanism from the kernel. But that's not how I think users expect "-cpu > >>>>>>>>>>>>host" to work. Maybe I am wrong, I don't know. I am CCing Andre, who > >>>>>>>>>>>>introduced the "-cpu host" feature, in case he can explain what's the > >>>>>>>>>>>>expected semantics on the cases above. > >>>>>>>>>>Can you think of any feature that'd go into category B? > >>>>>>>>>- TSC-deadline: can't be enabled unless userspace takes care to enable > >>>>>>>>>the in-kernel irqchip. > >>>>>>>>The kernel can check if in-kernel irqchip has it enabled and otherwise mask it out, no? > >>>>>>>> > >>>>>>>How kernel should know that userspace does not emulate it? > >>>>>>You have to enable the in-kernel apic to use it, at which point the kernel knows it's in use, right? > >>>>>> > >>>>>>>>>- x2apic: ditto. > >>>>>>>>Same here. For user space irqchip the kernel side doesn't care. If in-kernel APIC is enabled, check for its capabilities. > >>>>>>>> > >>>>>>>Same here. > >>>>>>> > >>>>>>>Well, technically both of those features can't be implemented in > >>>>>>>userspace right now since MSRs are terminated in the kernel, but I > >>>>>>Doesn't sound like the greatest design - unless you deprecate the non-in-kernel apic case. > >>>>>> > >>>>>You mean terminating MSRs in kernel does not sound like the greatest > >>>>>design? I do not disagree. That is why IMO kernel can't filter out > >>>>>TSC-deadline and x2apic like you suggest. > >>>>I still don't see why it can't. > >>>> > >>>>Imagine we would filter TSC-deadline and x2apic by default in the kernel - they are not known to exist yet. > >>>>Now, we implement TSC-deadline in the kernel. We still filter > >>>>TSC-deadline out in GET_SUPORTED_CPUID in the kernel. But we provide > >>>>an interface to user space that says "call me to enable TSC-deadline > >>>>CPUID, but only if you're using the in-kernel apic" > >>We have that interface already, it is called KVM_SET_CPUID. :-) > >> > >>>>New user space calls that ioctl when it's using the in-kernel apic, it doesn't when it's using the user space apic. > >>>>Old user space doesn't call that ioctl. > >>>First of all we already have TSC-deadline in GET_SUPORTED_CPUID without > >>>any additional ioctls. And second I do not see why we need additional > >>>iostls here. > >>We don't have TSC-deadline set today (and that's what started this > >>thread), but we have x2apic. So what you say is true for x2apic, anyway. > >> > >Hmm, strange. But s/TSC-deadline/x2apic/ and the point stands :) > > > >>>Hmm, so may be I misunderstood you. You propose to mask TSC-deadline and > >>>x2apic out from GET_SUPORTED_CPUID if irq chip is not in kernel, not > >>>from KVM_SET_CPUID? For those two features it may make sense indeed. > >>It makes sense to me. > >> > >>It looks like my assumptions were wrong. They were: > >> > >>- GET_SUPPORTED_CPUID simply can't know if the in-kernel irqchip is > >> going to be enabled or not. > >I think this is currently true. Before we changing that we have to make > >sure that no existing userspace calls GET_SUPPORTED_CPUID before > >creating irqchip. Not sure we can check all existing userspaces. > > > >>- GET_SUPPORTED_CPUID output has to be a function of the kernel code > >> capabilitie and host CPU, and not depend on any input from userspace. > >> > >>Are those assumptions incorrect? If we break them, we may try what > >>Alexander is proposing. It would be much more flexible than the options > >>I was considering. > >> > >>I didn't know ENABLE_CAP existed. Even if GET_SUPPORTED_CPUID can't > >What is ENABLE_CAP? > > It's an ioctl that you pass in a CAP. It's used on non-x86 already > to easily turn on features inside kvm :). > No wonder grep in arch/x86 told me nothing :) So for x86 this ioctl as good as non-existent for now. > > > >>check for the in-kernel irqchip setup for some reason, ENABLE_CAP could > >>be used by userpace to tell the kernel "I will enable the in-kernel > >>irqchip, so feel free to return features that depend on it on > >>GET_SUPPORTED_CPUID". > >> > >>In other words, we would return only the type-A features on > >>GET_SUPPORTED_CPUID (i.e. safe to be blindly enabled by -cpu host as > >>long as migration is not required), but if we use ENABLE_CAP we can make > >>group A safely grow, as long as userspace first tells the kernel what it > >>supports. > >> > >>Anybody is against doing that? Otherwise I plan to work on this. > >>Probably I will start by making GET_SUPPORTED_CPUID not return x2apic > >>unless userspace tells the kernel (using ENABLE_CAP) that it will enable > >>the in-kernel irqchip. Then we can do the same with TSC-deadline. > >> > >>Note that all this work is to allow the kernel to let userspace blindly > >>enable features it _doesn't know yet_. If we limit ourselves to features > >>userspace already knows about, we could simply remove x2apic and > >>TSC-deadline from GET_SUPPORTED_CPUID completely, not use ENABLE_CAP for > >>that, and let userspace set x2apic or TSC-deadline on KVM_SET_CPUID only > >>if it knows it is safe (either because it checked for the corresponding > >>KVM_CAP_* capability is present and it will enable the in-kernel > >>irqchip, or because it will emulated it in userspace). > >> > >There is not KVM_CAP_* for x2apic as far as I see. > > > >>In case anybody is against the proposal above: note that the current > >>documented GET_SUPPORTED_CPUID semantics (unconditionally returning bits > >>that depend on specific userspace behavior/capabilities) is simply > >>unusable by "-cpu host". If the above proposal gets rejected, my Plan B > >>is to update the GET_SUPPORTED_CPUID documentation to note that it > >>returns only type-A features (features that userspace can safely enable > >>even if it doesn't know what it does), remove x2apic from > >>GET_SUPPORTED_CPUID forever, and use KVM_CAP_* for discovery of all > >>type-B features (features that depend on specific userspace > >>capabilities/behavior). > >> > >This will break existing userspaces, no? > > It would mean that new user space on old kernels don't get x2apic or > tsc-deadline in cpuid, yes. > tsc-deadline actually exposed via capability, but disappearance of x2apic would be serious regression. > > > >>>Not > >>>sure there won't be others that are not dependent on irq chip presence. > >>>You propose to add additional ioctls to enable them if they appear? > >>I am sure there will be new features in the future that don't depend on > >>any userspace support, so they would be enabled on GET_SUPPORTED_CPUID > >>unconditionally. > >> > >Those not the kind of features I am worry about though. I worry about > >the features that can be emulated either by kernel or by userspace and > >userspace may choose where it wants the feature to be emulated. > > Like? Usually CPUID features reflect new instructions. We don't > notify user space on unknown instructions, right? > No we do not. I can't give you concrete example, but think about some apic unrelated device that is accessible via MSR and can be emulated either in kernel or in userspace. Hmm, maybe something like ACPI (Thermal Monitor and Software Controlled Clock Facilities), or PSN (Processor Serial Number). -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html