Comments inline. Sorry for top-posting. Gmail is not my normal mode of LKML processing, but hey. On Tue, Aug 2, 2011 at 5:54 AM, Nadav Har'El <nyh@xxxxxxxxxx> wrote: > This patch fixes two corner cases in nested (L2) handling of TSC-related > issues: > > 1. Somewhat suprisingly, according to the Intel spec, if L1 allows WRMSR to > the TSC MSR without an exit, then this should set L1's TSC value itself - not > offset by vmcs12.TSC_OFFSET (like was wrongly done in the previous code). > > 2. Allow L1 to disable the TSC_OFFSETING control, and then correctly ignore > the vmcs12.TSC_OFFSET. > > Signed-off-by: Nadav Har'El <nyh@xxxxxxxxxx> > --- > arch/x86/kvm/vmx.c | 31 +++++++++++++++++++++---------- > 1 file changed, 21 insertions(+), 10 deletions(-) > > --- .before/arch/x86/kvm/vmx.c 2011-08-02 15:51:02.000000000 +0300 > +++ .after/arch/x86/kvm/vmx.c 2011-08-02 15:51:02.000000000 +0300 > @@ -1777,15 +1777,23 @@ static void vmx_set_tsc_khz(struct kvm_v > */ > static void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset) > { > - vmcs_write64(TSC_OFFSET, offset); > - if (is_guest_mode(vcpu)) > + if (is_guest_mode(vcpu)) { > /* > - * We're here if L1 chose not to trap the TSC MSR. Since > - * prepare_vmcs12() does not copy tsc_offset, we need to also > - * set the vmcs12 field here. > + * We're here if L1 chose not to trap WRMSR to TSC. According > + * to the spec, this should set L1's TSC; The offset that L1 > + * set for L2 remains unchanged, and still needs to be added > + * to the newly set TSC to get L2's TSC. > */ > - get_vmcs12(vcpu)->tsc_offset = offset - > - to_vmx(vcpu)->nested.vmcs01_tsc_offset; > + struct vmcs12 *vmcs12; > + to_vmx(vcpu)->nested.vmcs01_tsc_offset = offset; > + /* recalculate vmcs02.TSC_OFFSET: */ > + vmcs12 = get_vmcs12(vcpu); > + vmcs_write64(TSC_OFFSET, offset + > + (nested_cpu_has(vmcs12, CPU_BASED_USE_TSC_OFFSETING) ? > + vmcs12->tsc_offset : 0)); > + } else { > + vmcs_write64(TSC_OFFSET, offset); > + } > } This part looks good. > static void vmx_adjust_tsc_offset(struct kvm_vcpu *vcpu, s64 adjustment) > @@ -6529,8 +6537,11 @@ static void prepare_vmcs02(struct kvm_vc > > set_cr4_guest_host_mask(vmx); > > - vmcs_write64(TSC_OFFSET, > - vmx->nested.vmcs01_tsc_offset + vmcs12->tsc_offset); > + if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETING) > + vmcs_write64(TSC_OFFSET, > + vmx->nested.vmcs01_tsc_offset + vmcs12->tsc_offset); > + else > + vmcs_write64(TSC_OFFSET, vmx->nested.vmcs01_tsc_offset); I need more context here... where do you apply the adjustment? The offset should be added to the vmcs01_tsc_offset only (but also written into the hardware VMCS, which should not be preserved when the guest exits). > > if (enable_vpid) { > /* > @@ -6937,7 +6948,7 @@ static void nested_vmx_vmexit(struct kvm > > load_vmcs12_host_state(vcpu, vmcs12); > > - /* Update TSC_OFFSET if vmx_adjust_tsc_offset() was used while L2 ran */ > + /* Update TSC_OFFSET if TSC was changed while L2 ran */ > vmcs_write64(TSC_OFFSET, vmx->nested.vmcs01_tsc_offset); > > /* This is needed for same reason as it was needed in prepare_vmcs02 */ > This is correct. You should always restore the L1 offset when exiting if it might have changed. This implies also that you must update vmx->nested.vmcs01_tsc_offset if you receive a call to vmx_adjust_tsc_offset while L2 is running, which is why I wanted to see more context above. Zach -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html