Re: [Question] About the behavior of HLT in VMX guest mode

Wanpeng Li <kernellwp@xxxxxxxxx> · Tue, 21 Mar 2017 14:21:55 +0800

2017-03-21 9:47 GMT+08:00 Longpeng (Mike) <longpeng2@xxxxxxxxxx>:
> Hi Radim,
>
> On 2017/3/20 23:18, Radim Krčmář wrote:
>
>> 2017-03-17 13:22+0800, Longpeng (Mike):
>>> Hi Radim,
> ...
>>> In my humble opinion:
>>>
>>> 1) As "Intel sdm vol3 ch25.3" says, MWAIT operates normally (I think includes
>>> entering deeper sleep) under certain conditions.
>>> Some deeper sleep modes(such as C4E/C6/C7) will clear the L1/L2/L3 cache.
>>> This is insecurity if we don't take other protective measures(such as limit the
>>> guest's max-cstate, it's fortunately that power subsystem isn't supported by
>>> QEMU, but we should be careful for some special-purpose in case). While HLT in
>>> guest mode can't cause hardware into sleep.
>>
>> Good point.  I'm not aware of any VMX capabilities to prevent deeper
>> C-states, so we'd always hope that guests obey provided information.
>>
>
>
> I'll do some tests this weekend.
> I plan to use MWAIT to enter deeper C-states in a testcase of kvm-unit-tests,
> and start a memory-sensitive workload on another hyper-thread, then use
> intel-pcm or perf to observe the count of cache miss on that core.
>
>>> 2) According to the "Intel sdm vol3 ch26.3.3 & ch27.5.6", I think MONITOR in
>>> guest mode can't work as perfect as in host sometimes.
>>> For example, a vcpu MONITOR a address and then MWAIT, if a external-intr(suppose
>>> this intr won't cause to inject any virtual events ) cause VMEXIT, the monitor
>>> address will be cleaned, so the MWAIT won't be waken up by a store operation to
>>> the monitored address any more.
>>
>> It's not as perfect, but should not cause a bug (well, there is a
>> discussion with suspicious MWAIT behavior :]).
>> MWAIT on all Intels I tested would just behave as a nop if exit happened
>> between MONITOR and MWAIT, like it does if you skip the MONITOR (MWAIT
>> instruction desciption):
>>
>>   If the preceding MONITOR instruction did not successfully arm an
>>   address range or if the MONITOR instruction has not been executed
>>   prior to executing MWAIT, then the processor will not enter the
>>   implementation-dependent-optimized state. Execution will resume at the
>>   instruction following the MWAIT.
>>
>
>
> OK. :)
>
>>> But I'm glad to do some tests if time permits, thanks :)
>>>
>>> Radim, how about to make HLT-exiting configurable again in upstream ? If you
>>> like it, there is a problem should be resolved, asynpf is conflict with
>>> "HLT-exiting = 0" in certain situations.
>>
>> Go ahead.  KVM should provide access to hardware features and
>> no-HLT-exiting is reasonable as a per-VM (even per-VCPU if you make a
>> good case) capability.  I'm interested in the asyncpf conflict.

After async pf setup successfully, there is a broadcast wakeup w/
special token 0xffffffff which tells vCPU that it should wake up all
processes waiting for APFs though there is no real process waiting at
the moment.

Refer to SDM 26.3.1.5:

HLT. The only events allowed are the following:

— Those with interruption type external interrupt or non-maskable
interrupt (NMI).
— Those with interruption type hardware exception and vector 1 (debug
exception) or vector 18 (machine-check exception).
— Those with interruption type other event and vector 0 (pending MTF VM exit).

So if the guest activity state is hlt and delivery #PF event during
vmentry, the vmentry will fail.

Refer to the original "KVM: VMX: add module parameter to avoid
trapping HLT instructions"
https://www.spinics.net/lists/kvm-commits/msg00137.html, it will set
guest activity state to active if it is hlt before manually. Actually
I wonder who set guest activity state to active when there is
HLT-exiting.

In addition, what's your design for per-VM non HLT-exiting capability?

> I had did some offline discussion with Wanpeng Li, he's interesting to write a
> path for this feature. :)

Thanks Longpeng. :)

Regards,
Wanpeng Li