Re: [PATCH v2] KVM: X86: Fix softlockup when get the current kvmclock timestamp

Wanpeng Li <kernellwp@xxxxxxxxx> · Thu, 9 Nov 2017 08:43:50 +0800

2017-11-09 0:26 GMT+08:00 Radim Krčmář <rkrcmar@xxxxxxxxxx>:
> 2017-11-06 04:17-0800, Wanpeng Li:
>> From: Wanpeng Li <wanpeng.li@xxxxxxxxxxx>
>>
>>  watchdog: BUG: soft lockup - CPU#6 stuck for 22s! [qemu-system-x86:10185]
>>  CPU: 6 PID: 10185 Comm: qemu-system-x86 Tainted: G           OE   4.14.0-rc4+ #4
>>  RIP: 0010:kvm_get_time_scale+0x4e/0xa0 [kvm]
>>  Call Trace:
>>   ? get_kvmclock_ns+0xa3/0x140 [kvm]
>>   get_time_ref_counter+0x5a/0x80 [kvm]
>>   kvm_hv_process_stimers+0x120/0x5f0 [kvm]
>>   ? kvm_hv_process_stimers+0x120/0x5f0 [kvm]
>>   ? preempt_schedule+0x27/0x30
>>   ? ___preempt_schedule+0x16/0x18
>>   kvm_arch_vcpu_ioctl_run+0x4b4/0x1690 [kvm]
>>   ? kvm_arch_vcpu_load+0x47/0x230 [kvm]
>>   kvm_vcpu_ioctl+0x33a/0x620 [kvm]
>>   ? kvm_vcpu_ioctl+0x33a/0x620 [kvm]
>>   ? kvm_vm_ioctl_check_extension_generic+0x3b/0x40 [kvm]
>>   ? kvm_dev_ioctl+0x279/0x6c0 [kvm]
>>   do_vfs_ioctl+0xa1/0x5d0
>>   ? __fget+0x73/0xa0
>>   SyS_ioctl+0x79/0x90
>>   entry_SYSCALL_64_fastpath+0x1e/0xa9
>>
>> This can be reproduced when running kvm-unit-tests/hyperv_stimer.flat and
>> cpu-hotplug stress simultaneously. __this_cpu_read(cpu_tsc_khz) returns 0
>> (set in kvmclock_cpu_down_prep()) when the pCPU is unhotplug which results
>> in kvm_get_time_scale() gets into an infinite loop.
>>
>> This patch fixes it by skipping to fill the hv_clock when the pCPU is offline.
>>
>> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
>> Cc: Radim Krčmář <rkrcmar@xxxxxxxxxx>
>> Signed-off-by: Wanpeng Li <wanpeng.li@xxxxxxxxxxx>
>> ---
>> v1 -> v2:
>>  * avoid infinite loop
>>
>>  arch/x86/kvm/x86.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 03869eb..d2507c6 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -1259,6 +1259,9 @@ static void kvm_get_time_scale(uint64_t scaled_hz, uint64_t base_hz,
>>       uint64_t tps64;
>>       uint32_t tps32;
>>
>> +     if (unlikely(base_hz == 0))
>> +             return;
>
> This is a sensible thing to do and will prevent the loop, but KVM will
> still have a minor bug:  get_kvmclock_ns() passes uninitialized stack
> values with the expectation that kvm_get_time_scale() will set them, but
> returning here would result in __pvclock_read_cycles() with random data
> and inject timer interrupts early (if not worse).
>
> I think it would be best if kvm_get_time_scale() wasn't executing when
> cpu_tsc_khz is 0, by clearing cpu_tsc_khz later and setting earlier;
> do you see any problems with moving the CPUHP_AP_X86_KVM_CLK_ONLINE
> before CPUHP_AP_ONLINE?

I think this will break Thomas's hotplug state machine, and I'm not
the hotplug expert. How about something like below to avoid the random
data:

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 34c85aa..954f510 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1795,10 +1795,13 @@ u64 get_kvmclock_ns(struct kvm *kvm)
        /* both __this_cpu_read() and rdtsc() should be on the same cpu */
        get_cpu();

-       kvm_get_time_scale(NSEC_PER_SEC, __this_cpu_read(cpu_tsc_khz) * 1000LL,
-                          &hv_clock.tsc_shift,
-                          &hv_clock.tsc_to_system_mul);
-       ret = __pvclock_read_cycles(&hv_clock, rdtsc());
+       if (__this_cpu_read(cpu_tsc_khz)) {
+               kvm_get_time_scale(NSEC_PER_SEC,
__this_cpu_read(cpu_tsc_khz) * 1000LL,
+                                  &hv_clock.tsc_shift,
+                                  &hv_clock.tsc_to_system_mul);
+               ret = __pvclock_read_cycles(&hv_clock, rdtsc());
+       } else
+               ret = ktime_get_boot_ns() + ka->kvmclock_offset;

        put_cpu();