Re: [PATCH] target-i386: clear guest TSC on reset

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Il 05/12/2013 07:15, Fernando Luis Vázquez Cao ha scritto:
> VCPU TSC is not cleared by a warm reset (*), which leaves many Linux
> guests vulnerable to the overflow in cyc2ns_offset fixed by upstream
> commit 9993bc635d01a6ee7f6b833b4ee65ce7c06350b1 ("sched/x86: Fix overflow
> in cyc2ns_offset").
> 
> To put it in a nutshell, if a Linux guest without the patch above applied
> has been up more than 208 days and attempts a warm reset chances are that
> the newly booted kernel will panic or hang.
> 
> (*) Intel Xeon E5 processors show the same broken behavior due to
>     the errata "TSC is Not Affected by Warm Reset" (Intel® Xeon®
>     Processor E5 Family Specification Update - August 2013): "The
>     TSC (Time Stamp Counter MSR 10H) should be cleared on
>     reset. Due to this erratum the TSC is not affected by warm
>     reset."
> 
> Cc: stable@xxxxxxxxxxxxxxx
> Cc: Will Auld <will.auld@xxxxxxxxx>
> Cc: Marcelo Tosatti <mtosatti@xxxxxxxxxx>
> Signed-off-by: Fernando Luis Vazquez Cao <fernando@xxxxxxxxxxxxx>

I agree that the bug is in QEMU.  One small nit in your patch is that
you should reset env->tsc_adjust and env->tsc in x86_cpu_reset.  This
would already be pretty good.

However, a bigger problem is that env->tsc is a useless duplicate of
"cpu_get_ticks() + env->tsc_adjust".  It would be nice to drop env->tsc
completely except for migration backwards compatibility.  Thus you can:

- fill in env->tsc as mentioned above from target-i386/machine.c's
cpu_pre_save function.  This guarantees backwards compatibility.

- add a function cpu_set_ticks(int64_t ticks) to cpus.c.  The function
does nothing if use_icount is true, otherwise it needs to have (roughly)
the opposite logic compared to cpu_get_ticks.  You then call this
function from x86_cpu_reset instead of setting env->tsc.  You can
similarly call this function from kvm_get_msrs.

- add a function kvm_set_ticks(int64_t ticks) to kvm-all.c and
kvm-stub.c.  For kvm-all.c it calls kvm_arch_set_ticks(CPUState *cpu,
int64_t ticks) in target-*/kvm.c.  The kvm_arch_set_tsc() function has a
dummy implementation for all architectures except x86.  For x86 it calls
KVM_SET_MSRS passing "ticks + env->tsc_offset".

- call kvm_set_ticks() from cpu_set_ticks() and cpu_enable_ticks()

Can you do this?

Thanks,

Paolo

> ---
> 
> --- qemu-orig/target-i386/kvm.c	2013-11-28 07:02:45.000000000 +0900
> +++ qemu/target-i386/kvm.c	2013-12-05 14:47:03.085738175 +0900
> @@ -1125,6 +1125,8 @@ static int kvm_put_msrs(X86CPU *cpu, int
>          kvm_msr_entry_set(&msrs[n++], MSR_VM_HSAVE_PA, env->vm_hsave);
>      }
>      if (has_msr_tsc_adjust) {
> +        if (level == KVM_PUT_RESET_STATE)
> +            env->tsc_adjust = 0;
>          kvm_msr_entry_set(&msrs[n++], MSR_TSC_ADJUST, env->tsc_adjust);
>      }
>      if (has_msr_misc_enable) {
> @@ -1139,22 +1141,22 @@ static int kvm_put_msrs(X86CPU *cpu, int
>          kvm_msr_entry_set(&msrs[n++], MSR_LSTAR, env->lstar);
>      }
>  #endif
> -    if (level == KVM_PUT_FULL_STATE) {
> +    /*
> +     * The following MSRs have side effects on the guest or are too heavy
> +     * for normal writeback. Limit them to reset or full state updates.
> +     */
> +    if (level >= KVM_PUT_RESET_STATE) {
> +        if (level == KVM_PUT_RESET_STATE)
> +            env->tsc = 0;
>          /*
>           * KVM is yet unable to synchronize TSC values of multiple VCPUs on
>           * writeback. Until this is fixed, we only write the offset to SMP
>           * guests after migration, desynchronizing the VCPUs, but avoiding
>           * huge jump-backs that would occur without any writeback at all.
>           */
> -        if (smp_cpus == 1 || env->tsc != 0) {
> +        if (smp_cpus == 1 || env->tsc != 0 || level == KVM_PUT_RESET_STATE) {
>              kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSC, env->tsc);
>          }
> -    }
> -    /*
> -     * The following MSRs have side effects on the guest or are too heavy
> -     * for normal writeback. Limit them to reset or full state updates.
> -     */
> -    if (level >= KVM_PUT_RESET_STATE) {
>          kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME,
>                            env->system_time_msr);
>          kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux