Re: [PATCH v5 untested] kvm: better MWAIT emulation for guests

Radim Krčmář <rkrcmar@xxxxxxxxxx> · Wed, 29 Mar 2017 14:11:47 +0200

2017-03-28 13:35-0700, Jim Mattson:
> On Tue, Mar 28, 2017 at 7:28 AM, Radim Krčmář <rkrcmar@xxxxxxxxxx> wrote:
>> 2017-03-27 15:34+0200, Alexander Graf:
>>> On 15/03/2017 22:22, Michael S. Tsirkin wrote:
>>>> Guests running Mac OS 5, 6, and 7 (Leopard through Lion) have a problem:
>>>> unless explicitly provided with kernel command line argument
>>>> "idlehalt=0" they'd implicitly assume MONITOR and MWAIT availability,
>>>> without checking CPUID.
>>>>
>>>> We currently emulate that as a NOP but on VMX we can do better: let
>>>> guest stop the CPU until timer, IPI or memory change.  CPU will be busy
>>>> but that isn't any worse than a NOP emulation.
>>>>
>>>> Note that mwait within guests is not the same as on real hardware
>>>> because halt causes an exit while mwait doesn't.  For this reason it
>>>> might not be a good idea to use the regular MWAIT flag in CPUID to
>>>> signal this capability.  Add a flag in the hypervisor leaf instead.
>>>
>>> So imagine we had proper MWAIT emulation capabilities based on page faults.
>>> In that case, we could do something as fancy as
>>>
>>> Treat MWAIT as pass-through by default
>>>
>>> Have a per-vcpu monitor timer 10 times a second in the background that
>>> checks which instruction we're in
>>>
>>> If we're in mwait for the last - say - 1 second, switch to emulated MWAIT,
>>> if $IP was in non-mwait within that time, reset counter.
>>
>> Or we could reuse external interrupts for sampling.  Exits trigerred by
>> them would check for current instruction (probably would be best to
>> limit just to timer tick) and a sufficient ratio (> 0?) of other exits
>> would imply that MWAIT is not used.
>>
>>> Or instead maybe just reuse the adapter hlt logic?
>>
>> Emulated MWAIT is very similar to emulated HLT, so reusing the logic
>> makes sense.  We would just add new wakeup methods.
>>
>>> Either way, with that we should be able to get super low latency IPIs
>>> running while still maintaining some sanity on systems which don't have
>>> dedicated CPUs for workloads.
>>>
>>> And we wouldn't need guest modifications, which is a great plus. So older
>>> guests (and Windows?) could benefit from mwait as well.
>>
>> There is no need guest modifications -- it could be exposed as standard
>> MWAIT feature to the guest, with responsibilities for guest/host-impact
>> on the user.
>>
>> I think that the page-fault based MWAIT would require paravirt if it
>> should be enabled by default, because of performance concerns:
>> Enabling write protection on a page needs a VM exit on all other VCPUs
>> when beginning monitoring (to reload page permissions and prevent missed
>> writes).
>> We'd want to keep trapping writes to the page all the time because
>> toggling is slow, but this could regress performance for an OS that has
>> other data accessed by other VCPUs in that page.
>> No current interface can tell the guest that it should reserve the whole
>> page instead of what CPUID[5] says and that writes to the monitored page
>> are not "cheap", but can trigger a VM exit ...
> 
> CPUID.05H:EBX is supposed to address the false sharing issue. IIRC,
> VMware Fusion reports 64 in CPUID.05H:EAX and 4096 in CPUID.05H:EBX
> when running Mac OS X guests. Per Intel's SDM volume 3, section
> 8.10.5, "To avoid false wake-ups; use the largest monitor line size to
> pad the data structure used to monitor writes. Software must make sure
> that beyond the data structure, no unrelated data variable exists in
> the triggering area for MWAIT. A pad may be needed to avoid this
> situation." Unfortunately, most operating systems do not follow this
> advice.

Right, EBX provides what we need to expose that the whole page is
monitored, thanks!

>             Unfortunately, most operating systems do not follow this
> advice.

Yeah ... KVM could add yet another heuristic to drop MWAIT emulation and
use hardware if there were many traps while the target was not MWAITING,
it's getting over-complicated, though :/