Re: SVM: vmload/vmsave-free VM exits?

Jan Kiszka <jan.kiszka@xxxxxxxxxxx> · Mon, 13 Apr 2015 19:35:13 +0200

On 2015-04-13 19:29, Avi Kivity wrote:
> On 04/13/2015 10:01 AM, Jan Kiszka wrote:
>> On 2015-04-07 07:43, Jan Kiszka wrote:
>>> On 2015-04-05 19:12, Valentine Sinitsyn wrote:
>>>> Hi Jan,
>>>>
>>>> On 05.04.2015 13:31, Jan Kiszka wrote:
>>>>> studying the VM exit logic of Jailhouse, I was wondering when AMD's
>>>>> vmload/vmsave can be avoided. Jailhouse as well as KVM currently use
>>>>> these instructions unconditionally. However, I think both only need
>>>>> GS.base, i.e. the per-cpu base address, to be saved and restored if no
>>>>> user space exit or no CPU migration is involved (both is always
>>>>> true for
>>>>> Jailhouse). Xen avoids vmload/vmsave on lightweight exits but it also
>>>>> still uses rsp-based per-cpu variables.
>>>>>
>>>>> So the question boils down to what is generally faster:
>>>>>
>>>>> A) vmload
>>>>>      vmrun
>>>>>      vmsave
>>>>>
>>>>> B) wrmsrl(MSR_GS_BASE, guest_gs_base)
>>>>>      vmrun
>>>>>      rdmsrl(MSR_GS_BASE, guest_gs_base)
>>>>>
>>>>> Of course, KVM also has to take into account that heavyweight exits
>>>>> still require vmload/vmsave, thus become more expensive with B) due to
>>>>> the additional MSR accesses.
>>>>>
>>>>> Any thoughts or results of previous experiments?
>>>> That's a good question, I also thought about it when I was finalizing
>>>> Jailhouse AMD port. I tried "lightweight exits" with apic-demo but it
>>>> didn't seem to affect the latency in any noticeable way. That's why I
>>>> decided not to push the patch (in fact, I was even unable to find it
>>>> now).
>>>>
>>>> Note however that how AMD chips store host state during VM switches are
>>>> implementation-specific. I did my quick experiments on one CPU only, so
>>>> your mileage may vary.
>>>>
>>>> Regarding your question, I feel B will be faster anyways but again I'm
>>>> afraid that the gain could be within statistical error of the
>>>> experiment.
>>> It is, at least 160 cycles with hot caches on an AMD A6-5200 APU, more
>>> towards 600 if they are colder (added some usleep to each loop in the
>>> test).
>>>
>>> I've tested via vmmcall from guest userspace under Jailhouse. KVM should
>>> be adjustable in a similar way. Attached the benchmark, patch will be in
>>> the Jailhouse next branch soon. We need to check more CPU types, though.
>> Avi, I found some preparatory patches of yours from 2010 [1]. Do you
>> happen to remember if it was never completed for a technical reason?
> 
> IIRC, I came to the conclusion that it was impossible.  Something about
> TR.size not receiving a reasonable value.  Let me see.

To my understanding, TR doesn't play a role until we leave ring 0 again.
Or what could make the CPU look for any of the fields in the 64-bit TSS
before that?

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html