Re: Using kvmclock cause to stuck VM when set a hot-plugged vCPU online

Wanpeng Li <kernellwp@xxxxxxxxx> · Thu, 24 May 2018 08:47:31 +0800

2018-05-24 3:33 GMT+08:00 Sergey Al. Slabnov <sergey.slabnov@xxxxxxxxx>:
> Hi!
>
> I found a problem that if you hot-add and enable vCPU, then the time
> inside the virtual machine breaks down.
> The problem occurs only if the properties of the virtual machine
> include support for kvm-clocks. If you turn off the kvm-clock from the
> host system side, then almost everything is fine.
>
>
> My test host system:
> QEMU emulator version 2.11.1 (in KVM mode)
> libvirt 1.3.5
>

What's the version of your host kernel?

Regards,
Wanpeng Li

>
> Part of the XML configuration for libvirt, through which I manage the KVM:
>
>   <vcpu placement='static' current='1'>4</vcpu>
>   <clock offset='utc'>
>     <timer name='pit' tickpolicy='delay'/>
>     <timer name='rtc' tickpolicy='catchup'/>
>     <timer name='hpet' present='no'/>
>     <timer name='kvmclock' present='yes'/>
>   </clock>
>
> Guest - Linux like CentOS 7, Ubuntu 16, Ubuntu 18 with kernel boot
> option clocksource_failover=acpi_pm
>
>
> I add additionaly udev rule for automatically enabling the hot-added processor:
> SUBSYSTEM=="cpu", ACTION=="add", TEST=="online", ATTR{online}=="0",
> ATTR{online}="1"
>
> Or through manual activation:
> echo '1' > /sys/devices/system/cpu/cpu1/online
>
> Immediately after run inside the virtual system only one processor is available.
> I see in file /sys/devices/system/clocksource/clocksource0/available_clocksource
> kvm-clock tsc acpi_pm
> I see what current_clocksource is set to kvm-clock.
>
> I add the second processor on fly via libvirt (you can use the
> device_add method, but this is a longer way):
> virsh setvcpus instance-XXX --count 2 --live
>
> In this case, time warp inside VM occurs after enabling hot-added vCPU.
> At dmesg log I see like this:
> [  203.999504] CPU1 has been hot-added
> [  204.002868] SMP alternatives: switching to SMP code
> [  204.018793] x86: Booting SMP configuration:
> [  204.018795] smpboot: Booting Node 0 Processor 1 APIC 0x1
> [  204.022651] kvm-clock: cpu 1, msr 0:7ff30041, secondary cpu clock
> [  204.052743] TSC ADJUST compensate: CPU1 observed 494904040684 warp.
> Adjust: 494904040684
> [  422.405131] TSC ADJUST compensate: CPU1 observed 9 warp. Adjust: 494904040693
> [  422.425178] clocksource: timekeeping watchdog on CPU0: Marking
> clocksource 'tsc' as unstable because the skew is too large:
> [  422.425178] clocksource:                       'kvm-clock' wd_now:
> 1c6839b60b6c64 wd_last: 1c6806bfe0d793 mask: ffffffffffffffff
> [  422.425178] clocksource:                       'tsc' cs_now:
> 734708aa68 cs_last: 72fdac909a mask: ffffffffffffffff
> [  422.425178] tsc: Marking TSC unstable due to clocksource watchdog
>
> Since the start of the system, 204 seconds have passed and at this
> point I added and enable one more processor. At the time of activation
> of the second processor, the system decided that the system time
> should now be 422 seconds.
> The situation always repeats - when adding and activating another
> processor (second or third), the system time jumps forward
> approximately for the time that elapsed since the system was started.
>
>
> In the period from 204 seconds to real 422 seconds the system freezes:
> * new sessions can not be started (console and ssh)
> * In an already existing open console, you can run various programs,
> but some freeze at start (_top_ and _timedatectl_ for example)
> * _date_ shows the frozen time at the time of activation of the
> additional processor, even if you run it several times
> * _hwclock --show_ outputs the correct time
>
>
> If you turn off the kvm-clock from the host system side, then
> everything is fine.
>     <timer name='kvmclock' present='no'/>
>
>
> I found that I'm not alone with my time travel problem:
> * changlimin from https://bugzilla.kernel.org/show_bug.cgi?id=195207
> * Ken from https://www.spinics.net/lists/kvm/msg166981.html
> * and some others people, who have poorly described what happened.
>
> In some distributions of Linux or kernel versions, everything
> continues to work without freezing, but inside the logs you can still
> see that the system time is jumping forward.
>
>
> Can anyone guess anything, why does this happen and what else can I
> check to solve this problem exactly?
> Completely disabling the kvm-clock driver on host solves my problem,
> but this is the wrong way.
>
>
> --
> Thank you,
> Sergey