Re: [PATCH v2] kvm: x86: emulate monitor and mwait instructions as nop

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jun 02, 2014 at 11:01:07PM +0200, Alexander Graf wrote:
> 
> 
> > Am 02.06.2014 um 22:41 schrieb "Michael S. Tsirkin" <mst@xxxxxxxxxx>:
> > 
> >> On Mon, Jun 02, 2014 at 10:35:56PM +0200, Alexander Graf wrote:
> >> 
> >> 
> >>>> Am 02.06.2014 um 22:20 schrieb "Michael S. Tsirkin" <mst@xxxxxxxxxx>:
> >>>> 
> >>>> On Mon, Jun 02, 2014 at 09:48:19PM +0200, Alexander Graf wrote:
> >>>> 
> >>>> 
> >>>>>> Am 02.06.2014 um 21:25 schrieb "Gabriel L. Somlo" <gsomlo@xxxxxxxxx>:
> >>>>>> 
> >>>>>> On Wed, May 07, 2014 at 04:52:13PM -0400, Gabriel L. Somlo wrote:
> >>>>>> Treat monitor and mwait instructions as nop, which is architecturally
> >>>>>> correct (but inefficient) behavior. We do this to prevent misbehaving
> >>>>>> guests (e.g. OS X <= 10.7) from crashing after they fail to check for
> >>>>>> monitor/mwait availability via cpuid.
> >>>>>> 
> >>>>>> Since mwait-based idle loops relying on these nop-emulated instructions
> >>>>>> would keep the host CPU pegged at 100%, do NOT advertise their presence
> >>>>>> via cpuid, to prevent compliant guests from using them inadvertently.
> >>>>>> 
> >>>>>> Signed-off-by: Gabriel L. Somlo <somlo@xxxxxxx>
> >>>>>> ---
> >>>>>> 
> >>>>>> New in v2: remove invalid_op handler functions which were only used to
> >>>>>>         handle exits caused by monitor and mwait
> >>>>>> 
> >>>>>>>> On Wed, May 07, 2014 at 08:31:27PM +0200, Alexander Graf wrote:
> >>>>>>>> On 05/07/2014 08:15 PM, Michael S. Tsirkin wrote:
> >>>>>>>> If we really want to be paranoid and worry about guests
> >>>>>>>> that use this strange way to trigger invalid opcode,
> >>>>>>>> we can make it possible for userspace to enable/disable
> >>>>>>>> this hack, and teach qemu to set it.
> >>>>>>>> 
> >>>>>>>> That would make it even safer than it was.
> >>>>>>>> 
> >>>>>>>> Not sure it's worth it, just a thought.
> >>>>>>> 
> >>>>>>> Since we don't trap on non-exposed other instructions (new SSE and
> >>>>>>> whatdoiknow) I don't think it's really bad to just expose
> >>>>>>> MONITOR/MWAIT as nops.
> >>>>> 
> >>>>> Would it make sense to make this a module parameter,
> >>>>> (e.g., "int emulate_mwait") ?
> >>>>> 
> >>>>> Default would be 0 (no emulation). 1 would mean "emulate as nop", and
> >>>>> if anyone ever figures out how to do proper page-locking based
> >>>>> emulation we could use 2 to enable that, etc. ?
> >>>>> 
> >>>>> Not sure we'd want qemu to enable/disable it automatically, though...
> >>>>> 
> >>>>> What do you all think ?
> >>>> 
> >>>> I don't like module parameters - they're system global and there's a good chance you want to run non-osx in parallel ;).
> >>>> 
> >>>> I'd either link this to the cpuid bits or enable it forcefully through ENABLE_CAP per vcpu.
> >>>> 
> >>>> Alex
> >>> 
> >>> Point is that.
> >>> Paolo here thinks it's safe to just make it a NOP unconditionally.
> >>> so module parameter would be there as a debugging tool:
> >>> as a means for users to test with old kvm behaviour if they see breakage.
> >>> Which we don't expect, so no need to waste cycles creating a pretty
> >>> interface for it.
> >> 
> >> Both interfaces already exist, so where's the problem?
> > 
> > Hmm sorry which interfaces for enabling mwait nop emulation exist?
> 
> User space can force cpuid bits that kvm doesn't return as supported, so we do have a negative-by-default switch.
> 
> We also have an ENABLE_CAP ioctl. Enabling the monitor/mwait nop ability explicitly by that is a 5 line patch.
> 
> Either way is very flexible and not system wide.

W.r.t. monitor/mwait, a guest can do one of the following:

1. Never check CPUID, and never use monitor/mwait
	- This is great, we don't have to do anything about these

2. Check CPUID for mwait, use it to idle in preference over hlt
	- Linux, Windows, and Mavericks (10.9) do this
	- we never want to have CPUID say "yes" to these, since
	  monitor/mwait support will be clunky in the best case,
	  and hlt is overwhelmingly preferable! [*]

3. Never check CPUID, use monitor/mwait with abandon
	- OS X 10.6 .. 10.8 does this
	- emulating monitor/mwait here allows us to boot the guest
	  and use it, and perform sysadmin surgery to force a hlt
	  based idle

4. Check CPUID, panic if unavailable
	- OS X 10.5 did this, IIRC.
	- whether I can do kext surgery and get it to stop checking
	  CPUID *in addition to* falling back to hlt-based idle is
	  TBD.
	- emulating monitor/mwait allows us to boot this type of
	  guest, BUT WE ALSO HAVE TO ADVERTISE IT VIA CPUID !!!

I like telling qemu on the command line "do monitor = mwait = nop;
for this guest only", and having qemu pass that on to KVM for only the
VCPUs associated with this guest, optionally, for cases 3 and 4 only
(everyone else gets the invalid opcode fault behavior as before).


[*] I think we've been over this a few times already, but here's a
    quick recap:

	- monitor == mwait == NOP is correct (albeitwasteful) behavior
		- mwait MUST expect and deal with spurious wakeups
		  (per the Intel manual)
		- mwait == nop is an INSTANT spurious wakeup (hence
		  works OK with any correctly written program) !
		- monitor == nop won't "arm" anything, but that
		  doesn't matter if mwait always immediately wakes up !

		- this pegs the host CPU to 100%, so MUCH worse than
		  hlt, shouldn't do it unless we ABSOLUTELY HAVE TO !!!

	- guest-mode mwait should NEVER be allowed to stop the host CPU
	  (and, according to the Intel manual, it's HARD to try and
	  make it do so, which I think is on purpose !)

	- instead, guest-mode mwait should map to a host-side
	  condition-wait (where a write to a monitor-ed area
	  acts as condition-signal).

		- the most likely way to implement something like that
		  would be to write-protect pages and handle write faults
		- and I never got it working *properly* (but I'm a n00b,
		  so that ain't saying much :)
		- but the granularity would be all wrong compared to any
		  real CPU (1 page >> typical monitored area size)
		- but I still don't see it being any better than
		  hlt-based idle, even if we *did* get it to work correctly !!!



I'll look into ENABLE_CAP, and how to expose that on the qemu command
line (I think I might need both methods mentioned by Alex in tandem,
but I'll have to study existing examples before I can say anything
useful here). Any extra words of wisdom on how to do that, what
examples might be best to study for inspiration, etc, much appreciated !!!

Thanks,
--Gabriel
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux