On Fri, Jan 25, 2019 at 7:42 AM Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote: > > Currently, host_rsp is cached on a per-vCPU basis, i.e. it's stored in > struct vcpu_vmx. In non-nested usage the caching is for all intents > and purposes 100% effective, e.g. only the first VMLAUNCH needs to > synchronize VMCS.HOST_RSP since the call stack to vmx_vcpu_run() is > identical each and every time. But when running a nested guest, KVM > must invalidate the cache when switching the current VMCS as it can't > guarantee the new VMCS has the same HOST_RSP as the previous VMCS. In > other words, the cache loses almost all of its efficacy when running a > nested VM. > > Move host_rsp to struct vmcs_host_state, which is per-VMCS, so that it > is cached on a per-VMCS basis and restores its 100% hit rate when > nested VMs are in play. > > Note that the host_rsp cache for vmcs02 essentially "breaks" when > nested early checks are enabled as nested_vmx_check_vmentry_hw() will > see a different RSP at the time of its VM-Enter. While it's possible > to avoid even that VMCS.HOST_RSP synchronization, e.g. by employing a > dedicated VM-Exit stack, there is little motivation for doing so as > the overhead of two VMWRITEs (~55 cycles) is dwarfed by the overhead > of the extra VMX transition (600+ cycles) and is a proverbial drop in > the ocean relative to the total cost of a nested transtion (10s of > thousands of cycles). > > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> Reviewed-by: Jim Mattson <jmattson@xxxxxxxxxx>