(2013/12/05 18:28), Paolo Bonzini wrote:
Il 05/12/2013 07:15, Fernando Luis Vázquez Cao ha scritto:
VCPU TSC is not cleared by a warm reset (*), which leaves many Linux
guests vulnerable to the overflow in cyc2ns_offset fixed by upstream
commit 9993bc635d01a6ee7f6b833b4ee65ce7c06350b1 ("sched/x86: Fix overflow
in cyc2ns_offset").
To put it in a nutshell, if a Linux guest without the patch above applied
has been up more than 208 days and attempts a warm reset chances are that
the newly booted kernel will panic or hang.
(*) Intel Xeon E5 processors show the same broken behavior due to
the errata "TSC is Not Affected by Warm Reset" (Intel® Xeon®
Processor E5 Family Specification Update - August 2013): "The
TSC (Time Stamp Counter MSR 10H) should be cleared on
reset. Due to this erratum the TSC is not affected by warm
reset."
Cc: stable@xxxxxxxxxxxxxxx
Cc: Will Auld <will.auld@xxxxxxxxx>
Cc: Marcelo Tosatti <mtosatti@xxxxxxxxxx>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@xxxxxxxxxxxxx>
I agree that the bug is in QEMU. One small nit in your patch is that
you should reset env->tsc_adjust and env->tsc in x86_cpu_reset. This
would already be pretty good.
Yes, that is certainly cleaner (I should try not to take shortcuts...).
I am attaching
an updated patch (I apologize for not sending it inline - for reasons
better left
untold I am writing this on a problematic email client :) ).
However, a bigger problem is that env->tsc is a useless duplicate of
"cpu_get_ticks() + env->tsc_adjust". It would be nice to drop env->tsc
completely except for migration backwards compatibility. Thus you can:
- fill in env->tsc as mentioned above from target-i386/machine.c's
cpu_pre_save function. This guarantees backwards compatibility.
- add a function cpu_set_ticks(int64_t ticks) to cpus.c. The function
does nothing if use_icount is true, otherwise it needs to have (roughly)
the opposite logic compared to cpu_get_ticks. You then call this
function from x86_cpu_reset instead of setting env->tsc. You can
similarly call this function from kvm_get_msrs.
- add a function kvm_set_ticks(int64_t ticks) to kvm-all.c and
kvm-stub.c. For kvm-all.c it calls kvm_arch_set_ticks(CPUState *cpu,
int64_t ticks) in target-*/kvm.c. The kvm_arch_set_tsc() function has a
dummy implementation for all architectures except x86. For x86 it calls
KVM_SET_MSRS passing "ticks + env->tsc_offset".
- call kvm_set_ticks() from cpu_set_ticks() and cpu_enable_ticks()
Can you do this?
Can you pick my original fix first? I can do what you suggest in a follow-up
patch.
Thanks,
Fernando
[PATCH v2] target-i386: clear guest TSC on reset
From: Fernando Luis Vazquez Cao <fernando@xxxxxxxxxxxxx>
VCPU TSC is not cleared by a warm reset (*), which leaves many Linux
guests vulnerable to the overflow in cyc2ns_offset fixed by upstream
commit 9993bc635d01a6ee7f6b833b4ee65ce7c06350b1 ("sched/x86: Fix overflow
in cyc2ns_offset").
To put it in a nutshell, if a Linux guest without the patch above applied
has been up more than 208 days and attempts a warm reset chances are that
the newly booted kernel will panic or hang.
(*) Intel Xeon E5 processors show the same broken behavior due to
the errata "TSC is Not Affected by Warm Reset" (Intelツョ Xeonツョ
Processor E5 Family Specification Update - August 2013): "The
TSC (Time Stamp Counter MSR 10H) should be cleared on
reset. Due to this erratum the TSC is not affected by warm
reset."
Cc: Will Auld <will.auld@xxxxxxxxx>
Cc: Marcelo Tosatti <mtosatti@xxxxxxxxxx>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@xxxxxxxxxxxxx>
---
diff -urNp qemu-orig/target-i386/cpu.c qemu/target-i386/cpu.c
--- qemu-orig/target-i386/cpu.c 2013-11-28 07:02:45.000000000 +0900
+++ qemu/target-i386/cpu.c 2013-12-05 21:45:19.980156320 +0900
@@ -2446,6 +2446,9 @@ static void x86_cpu_reset(CPUState *s)
cpu_breakpoint_remove_all(env, BP_CPU);
cpu_watchpoint_remove_all(env, BP_CPU);
+ env->tsc_adjust = 0;
+ env->tsc = 0;
+
#if !defined(CONFIG_USER_ONLY)
/* We hard-wire the BSP to the first CPU. */
if (s->cpu_index == 0) {
diff -urNp qemu-orig/target-i386/kvm.c qemu/target-i386/kvm.c
--- qemu-orig/target-i386/kvm.c 2013-11-28 07:02:45.000000000 +0900
+++ qemu/target-i386/kvm.c 2013-12-05 21:45:28.900200552 +0900
@@ -1139,22 +1139,20 @@ static int kvm_put_msrs(X86CPU *cpu, int
kvm_msr_entry_set(&msrs[n++], MSR_LSTAR, env->lstar);
}
#endif
- if (level == KVM_PUT_FULL_STATE) {
+ /*
+ * The following MSRs have side effects on the guest or are too heavy
+ * for normal writeback. Limit them to reset or full state updates.
+ */
+ if (level >= KVM_PUT_RESET_STATE) {
/*
* KVM is yet unable to synchronize TSC values of multiple VCPUs on
* writeback. Until this is fixed, we only write the offset to SMP
* guests after migration, desynchronizing the VCPUs, but avoiding
* huge jump-backs that would occur without any writeback at all.
*/
- if (smp_cpus == 1 || env->tsc != 0) {
+ if (smp_cpus == 1 || env->tsc != 0 || level == KVM_PUT_RESET_STATE) {
kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSC, env->tsc);
}
- }
- /*
- * The following MSRs have side effects on the guest or are too heavy
- * for normal writeback. Limit them to reset or full state updates.
- */
- if (level >= KVM_PUT_RESET_STATE) {
kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME,
env->system_time_msr);
kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);