2016-07-08 21:58 GMT+08:00 Wanpeng Li <kernellwp@xxxxxxxxx>: > 2016-07-08 18:18 GMT+08:00 Paolo Bonzini <pbonzini@xxxxxxxxxx>: >> >> >> On 08/07/2016 02:38, Wanpeng Li wrote: >>> 2016-07-07 22:11 GMT+08:00 Paolo Bonzini <pbonzini@xxxxxxxxxx>: >>>> >>>> >>>> On 07/07/2016 15:23, Wanpeng Li wrote: >>>>>>>>> >>>>>>>>> if (kvm_lapic_hv_timer_in_use(vcpu) && >>>>>>>>> + (is_guest_mode(vcpu) || >>>>>>>>> kvm_x86_ops->set_hv_timer(vcpu, >>>>>>>>> - kvm_get_lapic_tscdeadline_msr(vcpu))) >>>>>>>>> + kvm_get_lapic_tscdeadline_msr(vcpu)))) >>>>>>>>> kvm_lapic_switch_to_sw_timer(vcpu); >>>>>>>>> if (check_tsc_unstable()) { >>>>>>>>> u64 offset = kvm_compute_tsc_offset(vcpu, >>>>>>>>> >>>>>>> >>>>>>> Thanks, this is good as a fallback. I'll try to fix it by getting the >>>>>>> pin-based execution controls right but if I fail this patch is okay. >>>>> I believe we still need this patch even if you implement "L1 TSC >>>>> deadline timer to trigger while L2 is running" eventually, the codes >>>>> you posted before: >>>>> >>>>> exec_control = vmcs12->pin_based_vm_exec_control; >>>>> +exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER; >>>>> exec_control |= vmcs_config.pin_based_exec_ctrl; >>>>> - exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER; >>>>> + if (vmx->hv_deadline_tsc == -1) >>>>> + exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER; >>>>> >>>>> So there is still case the preemption timer bit of vmcs02 is not set, >>>>> however, the scenario I mentioned above in kvm_arch_vcpu_load() will >>>>> set it unnecessary. >>>> >>>> kvm_x86_ops->set_hv_timer _will_ set the preemption timer bit of vmcs02 >>>> if vmcs02 is the loaded one. >>>> >>>> This can happen if L2 has access to L1's local APIC registers (i.e. L1 >>>> passes the local APIC instead of emulating it, as is the case in a >>>> partitioning hypervisor). While L2 runs, it writes to the TSC deadline >>>> MSR of L1. This causes a call to kvm_x86_ops->set_hv_timer while the >>>> active VMCS is a vmcs02. >>> >>> Yes, in the scenario you pointed out the call to >>> kvm_x86_ops->set_hv_timer while the active VMCS is vmcs02 is correct, >>> however, in the scenario I mentioned in the patch description is not >>> correct even if enable "L1 TSC deadline timer to trigger while L2 is >>> running". >> >> It doesn't help that you have not explained how to reproduce the >> bug---this is what the cover letter and commit messages are for, too. > > I believe you pointed out this before: > > | The patch looks correct, but I'm not sure how you get a preemption > | timer vmexit while vmcs02 is active: > | > | exec_control = vmcs12->pin_based_vm_exec_control; > | exec_control |= vmcs_config.pin_based_exec_ctrl; > | exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER; > > We can't get preemption timer vmexit which vmcs02 is loaded since > preemtion timer bit in vmcs02 is not set, then how can we get the > incorrect preemption timer vmexit in nested guest which patch 1 fixed? > Because the scenario I mentioned in patch 2 set vmcs02. > > w/o patch1 + w/o enable "L1 TSC deadline timer to trigger while L2 is > running" => we will get the bug calltrace mentioned in patch1 since > incorrect vmcs02 bit is set due to the bug mentioned in patch 2. So > apply patch2 can fix it. > > However, after enable "L1 TSC deadline timer to trigger while L2 is > running", we should have patch 1 to correctly capture nested vmexit > for preemption timer. > > My setup is L0 and L1 both are latest kvm queue branch w/ Yunhong's > preemption timer enable patches and my previous "fix missed > cancellation of TSC deadline timer" patches. I always enable > preemption timer in L0, but try to enable or disable preemption timer > in L1. Btw, my L1 is a full dynticks guest in order that hrtimer in L1 will be heavily used. Regards, Wanpeng Li -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html