Re: [PATCH v4] kvm: better MWAIT emulation for guests

Radim Krčmář <rkrcmar@xxxxxxxxxx> · Wed, 15 Mar 2017 21:46:18 +0100

2017-03-15 16:21-0400, Gabriel L. Somlo:
> On Wed, Mar 15, 2017 at 09:13:49PM +0100, Radim Krčmář wrote:
>> 2017-03-15 21:28+0200, Michael S. Tsirkin:
>> > Guests running Mac OS 5, 6, and 7 (Leopard through Lion) have a problem:
>> > unless explicitly provided with kernel command line argument
>> > "idlehalt=0" they'd implicitly assume MONITOR and MWAIT availability,
>> > without checking CPUID.
>> > 
>> > We currently emulate that as a NOP but on VMX we can do better: let
>> > guest stop the CPU until timer, IPI or memory change.  CPU will be busy
>> > but that isn't any worse than a NOP emulation.
>> > 
>> > Note that mwait within guests is not the same as on real hardware
>> > because halt causes an exit while mwait doesn't.  For this reason it
>> > might not be a good idea to use the regular MWAIT flag in CPUID to
>> > signal this capability.  Add a flag in the hypervisor leaf instead.
>> > 
>> > Additionally, we add a capability for QEMU - e.g. if it knows there's an
>> > isolated CPU dedicated for the VCPU it can set the standard MWAIT flag
>> > to improve guest behaviour.
>> > 
>> > Reported-by: "Gabriel L. Somlo" <gsomlo@xxxxxxxxx>
>> > Signed-off-by: Michael S. Tsirkin <mst@xxxxxxxxxx>
>> > ---
>> > 
>> > Note: SVM bits are untested at this point. Seems pretty
>> > obvious though.
>> > 
>> > changes from v3:
>> > - don't enable capability if cli+mwait blocks interrupts
>> > - doc typo fixes (drop drop ppc doc)
>> > 
>> > changes from v2:
>> > - add a capability to allow host userspace to detect new kernels
>> > - more documentation to clarify the semantics of the feature flag
>> >   and why it's useful
>> > - svm support as suggested by Radim
>> > 
>> > changes from v1:
>> > - typo fix resulting in rest of leaf flags being overwritten
>> >   Reported by: Wanpeng Li <kernellwp@xxxxxxxxx>
>> > - updated commit log with data about guests helped by this feature
>> > - better document differences between mwait and halt for guests
>> > 
>> > diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
>> > @@ -212,4 +213,28 @@ static inline u64 nsec_to_cycles(struct kvm_vcpu *vcpu, u64 nsec)
>> >  	    __rem;						\
>> >  	 })
>> >  
>> > +static bool kvm_mwait_in_guest(void)
>> > +{
>> > +	unsigned int eax, ebx, ecx;
>> > +
>> > +	if (!cpu_has(&boot_cpu_data, X86_FEATURE_MWAIT))
>> > +		return -ENODEV;
>> > +
>> > +	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
>> > +		return -ENODEV;
>> > +
>> > +	/*
>> > +	 * Intel CPUs without CPUID5_ECX_INTERRUPT_BREAK are problematic as
>> > +	 * they would allow guest to stop the CPU completely by disabling
>> > +	 * interrupts then invoking MWAIT.
>> > +	 */
>> > +	if (boot_cpu_data.cpuid_level < CPUID_MWAIT_LEAF)
>> > +		return -ENODEV;
>> > +
>> > +	cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &mwait_substates);
>> > +
>> > +	if (!(ecx & CPUID5_ECX_INTERRUPT_BREAK))
>> > +		return -ENODEV;
>> 
>> The guest is still able to set ecx=0 with MWAIT, which should be the
>> same as not having the CPUID flag, so I'm wondering how this check
>> prevents anything harmful ... is it really a cpu "feature"?
>> 
>> If we somehow report ecx bit 1 in CPUID[5], then the guest might try to
>> set ecx bit 0 for MWAIT, which will cause #GP(0) and could explain the
>> hang that Gabriel is hitting.
>> 
>> Gabriel,
>> 
>>  - do you see VM exits on the "hung" VCPU?
> 
> how would I go about looking ?

Probably the simplest would be to try install trace-cmd and do

  trace-cmd record -e kvm:kvm_entry

for a while, then killing it with ^C and calling

  trace-cmd report

it will have lines like:

  CPU-3729  [021]  5046.222480: kvm_entry:            vcpu 7

the first column is task id, the last one is VCPU id, so you'd verify that all
VCPUs eventually enter.
You could either look at the taks id of VCPUs that are running 100% of the time
or just pipe through `grep -o 'vcpu.*' | sort -u` and hope that all of them are
running, so you'd see nice list of them all. :)

If you don't see some VCPUs there, they might have been halted, which would
show on

  virsh qemu-monitor-command rhel7 --hmp info cpus

If there are running, but never entering VCPUs, then running

  trace-cmd record -e kvm:\*

to capture all kvm events could tell us why it is stuck.

>>  - what is your CPU model?
> 
> $ cat /proc/cpuinfo
> model name      : Intel(R) Xeon(R) CPU            5150  @ 2.66GHz
> 
> (this is 2x dual-core Xeon on a Mac Pro 1,1 -- all I had to spare for
> testing, to avoid having to reboot my primary desktop :)

Oh, ancient when it comes to VMX. :)

>>  - what do you get after running this C program on host and guest?
> 
> eax=0x000040 ebx=0x000040 ecx=0x000003 edx=0x000020

Your CPU has the CPUID5_ECX_INTERRUPT_BREAK, thanks.
The guest should see QEMU's default, which is

  eax=0x000000 ebx=0x000000 ecx=0x000003 edx=0x000000

Thanks.