On 22/05/2016, Wanpeng Li <kernellwp@xxxxxxxxx> wrote: > 2016-05-22 9:23 GMT+08:00 Wanpeng Li <kernellwp@xxxxxxxxx>: >> 2016-05-20 19:47 GMT+08:00 Alan Jenkins >> <alan.christopher.jenkins@xxxxxxxxx>: >>> Hi >>> >>> I'm getting a hard lockup in wait_lapic_expire(). I'm not certain why, >>> and it didn't seem to reproduce on a second setup. However I found a >>> suspect in the code for TSC deadline timers. Maybe someone is interested >>> in my analysis. >>> >>> If a guest has a TSC deadline timer set, it's not re-done when the TSC >>> is adjusted, and will fire at the wrong time. The hrtimer used for >>> emulation is not being recalculated. If the TSC was set _backwards_, I >>> think it could trigger a lockup in wait_lapic_expire(). This function is >>> a busy-loop optimization, which could be tricked into busy-looping for >>> too long. The expected busy-wait is `lapic_timer_advance_ns`, which >>> defaults to 0 (I haven't changed it). >> >> The default clockevent device for hrtimer is lapic, and tsc works as a >> clocksource device, even if tsc in guest is backwards/adjusted, >> clockevent device is not influenced and work normally I think, so we >> just need to keep clockevent device can fire asap when it expires. How >> about someting like below(I can't reproduce your bug and just can send >> out a formal patch after your confirm): I don't understand your explanation. I don't mind testing such a patch, as it will fix the issue I experience :). In my understanding this is only a workaround, so we should also add a (ratelimited) warning message. >> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c >> index bbb5b28..02a21d3 100644 >> --- a/arch/x86/kvm/lapic.c >> +++ b/arch/x86/kvm/lapic.c >> @@ -1310,7 +1310,7 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu) >> >> /* __delay is delay_tsc whenever the hardware has TSC, thus >> always. */ >> if (guest_tsc < tsc_deadline) >> - __delay(tsc_deadline - guest_tsc); >> + __delay(max((tsc_deadline - guest_tsc), >> lapic_timer_advance_ns)); >> } > > s/max/min wait I found another one >> - __delay(tsc_deadline - guest_tsc); >> __delay( >> tsc_deadline - guest_tsc) (tsc_deadline - guest_tsc) gives a value in terms of guest TSC rate. But guest TSCs are scaled. __delay() must expect a value in terms of host TSC rate. So the busy-wait could take shorter or longer than expected depending on how the guest TSC is scaled. Right? Alan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html