We saw a strange issue with local APIC timer. Some random CPU doesn't receive any local APIC timer interrupt, which causes different issues. The cpu uses TSC-Deadline mode for local APIC timer and APIC is in xAPIC mode. When this happens, manually writing TSC_DEADLINE MSR can trigger interrupt again and the system goes normal. Currently we only see this issue in E5-2660 v2 and E5-2680 v2 CPU. Compiler version seems mattering too, it's quite easy to reproduce the issue with v4.7 gcc. Since the local APIC timer interrupt number is 0, we either lose the first interrupt or TSC_DEADLINE MSR isn't set correctly. After some debugging, we believe it's the serialize issue described in Intel SDM. In xAPIC mode, write to APIC LVTT and write to TSC_DEADLINE isn't serialized. Debug shows read TSC_DEADLINE MSR followed the very first MSR write returns 0 in the buggy cpu. The patch uses the algorithm Intel SDM described. The issue only happens in xAPIC mode, but it doesn't bother to check the APIC mode I guess. Without this patch, we see the issue after ~5 reboots. With it, we don't see it after 24hr reboot test. Cc: Suresh Siddha <suresh.b.siddha@xxxxxxxxx> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Cc: H. Peter Anvin <hpa@xxxxxxxxx> Cc: Ingo Molnar <mingo@xxxxxxxxxx> Cc: stable@xxxxxxxxxxxxxxx v3.7+ Signed-off-by: Shaohua Li <shli@xxxxxx> --- arch/x86/kernel/apic/apic.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index dcb5285..b7890b3 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -336,6 +336,22 @@ static void __setup_APIC_LVTT(unsigned int clocks, int oneshot, int irqen) apic_write(APIC_LVTT, lvtt_value); if (lvtt_value & APIC_LVT_TIMER_TSCDEADLINE) { + u64 msr; + + /* + * See Intel SDM: TSC-Deadline Mode chapter. In xAPIC mode, + * writing APIC LVTT and TSC_DEADLINE MSR isn't serialized. + * This uses the algorithm described in Intel SDM to serialize + * the two writes + * */ + while (1) { + wrmsrl(MSR_IA32_TSC_DEADLINE, -1L); + rdmsrl(MSR_IA32_TSC_DEADLINE, msr); + if (msr) + break; + } + wrmsrl(MSR_IA32_TSC_DEADLINE, 0); + printk_once(KERN_DEBUG "TSC deadline timer enabled\n"); return; } -- 1.8.5.6 -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html