On 2015-04-13 19:29, Avi Kivity wrote: > On 04/13/2015 10:01 AM, Jan Kiszka wrote: >> On 2015-04-07 07:43, Jan Kiszka wrote: >>> On 2015-04-05 19:12, Valentine Sinitsyn wrote: >>>> Hi Jan, >>>> >>>> On 05.04.2015 13:31, Jan Kiszka wrote: >>>>> studying the VM exit logic of Jailhouse, I was wondering when AMD's >>>>> vmload/vmsave can be avoided. Jailhouse as well as KVM currently use >>>>> these instructions unconditionally. However, I think both only need >>>>> GS.base, i.e. the per-cpu base address, to be saved and restored if no >>>>> user space exit or no CPU migration is involved (both is always >>>>> true for >>>>> Jailhouse). Xen avoids vmload/vmsave on lightweight exits but it also >>>>> still uses rsp-based per-cpu variables. >>>>> >>>>> So the question boils down to what is generally faster: >>>>> >>>>> A) vmload >>>>> vmrun >>>>> vmsave >>>>> >>>>> B) wrmsrl(MSR_GS_BASE, guest_gs_base) >>>>> vmrun >>>>> rdmsrl(MSR_GS_BASE, guest_gs_base) >>>>> >>>>> Of course, KVM also has to take into account that heavyweight exits >>>>> still require vmload/vmsave, thus become more expensive with B) due to >>>>> the additional MSR accesses. >>>>> >>>>> Any thoughts or results of previous experiments? >>>> That's a good question, I also thought about it when I was finalizing >>>> Jailhouse AMD port. I tried "lightweight exits" with apic-demo but it >>>> didn't seem to affect the latency in any noticeable way. That's why I >>>> decided not to push the patch (in fact, I was even unable to find it >>>> now). >>>> >>>> Note however that how AMD chips store host state during VM switches are >>>> implementation-specific. I did my quick experiments on one CPU only, so >>>> your mileage may vary. >>>> >>>> Regarding your question, I feel B will be faster anyways but again I'm >>>> afraid that the gain could be within statistical error of the >>>> experiment. >>> It is, at least 160 cycles with hot caches on an AMD A6-5200 APU, more >>> towards 600 if they are colder (added some usleep to each loop in the >>> test). >>> >>> I've tested via vmmcall from guest userspace under Jailhouse. KVM should >>> be adjustable in a similar way. Attached the benchmark, patch will be in >>> the Jailhouse next branch soon. We need to check more CPU types, though. >> Avi, I found some preparatory patches of yours from 2010 [1]. Do you >> happen to remember if it was never completed for a technical reason? > > IIRC, I came to the conclusion that it was impossible. Something about > TR.size not receiving a reasonable value. Let me see. To my understanding, TR doesn't play a role until we leave ring 0 again. Or what could make the CPU look for any of the fields in the 64-bit TSS before that? Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html