Fwd: KVM_SYSTEM_TIME clock is marked unstable on a modern single-socket system

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I'm working on OSv (https://github.com/cloudius-systems/osv), a guest
operating system.

I've been investigating a phenomena of KVM_SYSTEM_TIME being marked as
unstable (PVCLOCK_TSC_STABLE_BIT cleared) by KVM on a modern
single-socket CPU since the very beginning of guest's life time.
According to the ABI, when pvclock_vcpu_time_info.flags does not have
the PVCLOCK_TSC_STABLE_BIT set we must not assume monotonicity and
need to compensate for this using atomic cmpxchg. However we would
like to avoid that cost unless necessary and since modern
single-socket CPUs have a stable TSC this cost seems unnecessary.

Setups tested:

Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz, kernel 3.13.0-29-generic
(Ubuntu), Qemu 2.0.0
Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz, kernel 3.14.7-200.fc20.x86_64
(Fedora), Qemu 1.6.2

Both CPUs have constant_tsc and nonstop_tsc flags and the host kernel
is using TSC as the master clock source.

Here's what tracing KVM during QEMU (v2.0.0) startup shows. I used
`trace-cmd record -e kvm` and then `trace-cmd report | egrep
'(tsc|clock)'`.

=== 1 vCPU ===

 qemu-system-x86-1353  [003] 75446.815273: kvm_update_master_clock:
masterclock 0 hostclock tsc offsetmatched 0
 qemu-system-x86-1355  [002] 75446.849021: kvm_write_tsc_offset:
vcpu=0 prev=0 next=18446556065789505232
 qemu-system-x86-1355  [002] 75446.849024: kvm_track_tsc:
vcpu_id 0 masterclock 0 offsetmatched 0 nr_online 1 hostclock tsc
 qemu-system-x86-1353  [000] 75446.905313: kvm_write_tsc_offset:
vcpu=0 prev=18446556065789505232 next=18446556065789505232
 qemu-system-x86-1353  [000] 75446.905315: kvm_track_tsc:
vcpu_id 0 masterclock 0 offsetmatched 1 nr_online 1 hostclock tsc
 qemu-system-x86-1353  [000] 75446.925064: kvm_write_tsc_offset:
vcpu=0 prev=18446556065789505232 next=18446556065789505232
 qemu-system-x86-1353  [000] 75446.925065: kvm_track_tsc:
vcpu_id 0 masterclock 0 offsetmatched 2 nr_online 1 hostclock tsc
 qemu-system-x86-1353  [000] 75446.925119: kvm_update_master_clock:
masterclock 0 hostclock tsc offsetmatched 0
 qemu-system-x86-1355  [000] 75446.932931: kvm_update_master_clock:
masterclock 0 hostclock tsc offsetmatched 0

=== 2 vCPUs ===

 qemu-system-x86-1431  [003] 75539.014452: kvm_update_master_clock:
masterclock 0 hostclock tsc offsetmatched 0
 qemu-system-x86-1433  [003] 75539.054169: kvm_write_tsc_offset:
vcpu=0 prev=0 next=18446555836061830054
 qemu-system-x86-1433  [003] 75539.054171: kvm_track_tsc:
vcpu_id 0 masterclock 0 offsetmatched 0 nr_online 1 hostclock tsc
 qemu-system-x86-1434  [003] 75539.078266: kvm_write_tsc_offset:
vcpu=1 prev=0 next=18446555836061830054
 qemu-system-x86-1434  [003] 75539.078269: kvm_track_tsc:
vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
 qemu-system-x86-1431  [002] 75539.134695: kvm_write_tsc_offset:
vcpu=0 prev=18446555836061830054 next=18446555836061830054
 qemu-system-x86-1431  [002] 75539.134698: kvm_track_tsc:
vcpu_id 0 masterclock 0 offsetmatched 2 nr_online 2 hostclock tsc
 qemu-system-x86-1431  [002] 75539.142465: kvm_write_tsc_offset:
vcpu=1 prev=18446555836061830054 next=18446555836061830054
 qemu-system-x86-1431  [002] 75539.142468: kvm_track_tsc:
vcpu_id 1 masterclock 0 offsetmatched 3 nr_online 2 hostclock tsc
 qemu-system-x86-1431  [002] 75539.182614: kvm_write_tsc_offset:
vcpu=0 prev=18446555836061830054 next=18446555836061830054
 qemu-system-x86-1431  [002] 75539.182617: kvm_track_tsc:
vcpu_id 0 masterclock 0 offsetmatched 4 nr_online 2 hostclock tsc
 qemu-system-x86-1431  [002] 75539.182758: kvm_write_tsc_offset:
vcpu=1 prev=18446555836061830054 next=18446555836061830054
 qemu-system-x86-1431  [002] 75539.182758: kvm_track_tsc:
vcpu_id 1 masterclock 0 offsetmatched 5 nr_online 2 hostclock tsc
 qemu-system-x86-1431  [002] 75539.182840: kvm_update_master_clock:
masterclock 0 hostclock tsc offsetmatched 0
 qemu-system-x86-1433  [003] 75539.194415: kvm_update_master_clock:
masterclock 0 hostclock tsc offsetmatched 0
 qemu-system-x86-1434  [001] 75539.218634: kvm_update_master_clock:
masterclock 0 hostclock tsc offsetmatched 0

When "masterclock" is 0 it means the PVCLOCK_TSC_STABLE_BIT will not
be set. It is not set because offsets are supposedly not matched.
However according to the kvm_track_tsc tracepoints each TSC write is
matched. But there are 3 writes per-CPU which makes
ka->nr_vcpus_matched_tsc to become larger than kvm->online_vcpus which
makes the following check in pvclock_update_vm_gtod_copy() to fail:

   vcpus_matched =  (ka->nr_vcpus_matched_tsc + 1 ==
atomic_read(&kvm->online_vcpus));

I turns out that kvm_write_tsc() is called with the target TSC value
of 0 from these 3 places per each vCPU:

The first one:

 0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel]
 0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm]
 0xffffffffa04cfd6b : kvm_arch_vcpu_postcreate+0x4b/0x80 [kvm]
 0xffffffffa04b8188 : kvm_vm_ioctl+0x418/0x750 [kvm]
 0xffffffff811fd0f0
 0xffffffff811fd331 (inexact)
 0xffffffff81121eb6 (inexact)
 0xffffffff81700869 (inexact)

The second one:

 0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel]
 0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm]
 0xffffffffa090610d : vmx_set_msr+0x29d/0x350 [kvm_intel]
 0xffffffffa04be83b : do_set_msr+0x3b/0x60 [kvm]
 0xffffffffa04c10a8 : msr_io+0xc8/0x160 [kvm]
 0xffffffffa04caeb6 : kvm_arch_vcpu_ioctl+0xc86/0x1060 [kvm]
 0xffffffffa04b6797 : kvm_vcpu_ioctl+0xc7/0x5a0 [kvm]
 0xffffffff811fd0f0
 0xffffffff811fd331 (inexact)
 0xffffffff81121eb6 (inexact)
 0xffffffff81700869 (inexact)

#0  kvm_vcpu_ioctl (cpu=0x55555637ac40, type=type@entry=0x4008ae89) at
/build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1780
#1  0x000055555585ed82 in kvm_put_msrs (cpu=cpu@entry=0x55555637ac40,
level=level@entry=0x3) at
/build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1270
#2  0x0000555555861edd in kvm_arch_put_registers (cpu=<optimized out>,
level=0x3) at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1909
#3  0x000055555582b1a2 in kvm_cpu_synchronize_post_init
(cpu=cpu@entry=0x55555637ac40) at
/build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1641
#4  0x00005555557cef5a in cpu_synchronize_post_init
(cpu=0x55555637ac40) at
/build/buildd/qemu-2.0.0+dfsg/include/sysemu/kvm.h:330
#5  cpu_synchronize_all_post_init () at /build/buildd/qemu-2.0.0+dfsg/cpus.c:521
#6  0x00005555555f0b1d in main (argc=<optimized out>, argv=<optimized
out>, envp=<optimized out>) at /build/buildd/qemu-2.0.0+dfsg/vl.c:4390

The third one:

 0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel]
 0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm]
 0xffffffffa090610d : vmx_set_msr+0x29d/0x350 [kvm_intel]
 0xffffffffa04be83b : do_set_msr+0x3b/0x60 [kvm]
 0xffffffffa04c10a8 : msr_io+0xc8/0x160 [kvm]
 0xffffffffa04caeb6 : kvm_arch_vcpu_ioctl+0xc86/0x1060 [kvm]
 0xffffffffa04b6797 : kvm_vcpu_ioctl+0xc7/0x5a0 [kvm]
 0xffffffff811fd0f0
 0xffffffff811fd331 (inexact)
 0xffffffff81121eb6 (inexact)
 0xffffffff81700869 (inexact)

#0  kvm_vcpu_ioctl (cpu=0x55555637ac40, type=type@entry=0x4008ae89) at
/build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1780
#1  0x000055555585ed82 in kvm_put_msrs (cpu=cpu@entry=0x55555637ac40,
level=level@entry=0x2) at
/build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1270
#2  0x0000555555861edd in kvm_arch_put_registers (cpu=<optimized out>,
level=0x2) at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1909
#3  0x000055555582b152 in kvm_cpu_synchronize_post_reset
(cpu=cpu@entry=0x55555637ac40) at
/build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1635
#4  0x00005555557ceeda in cpu_synchronize_post_reset
(cpu=0x55555637ac40) at
/build/buildd/qemu-2.0.0+dfsg/include/sysemu/kvm.h:323
#5  cpu_synchronize_all_post_reset () at
/build/buildd/qemu-2.0.0+dfsg/cpus.c:512
#6  0x00005555555f0c5d in main (argc=<optimized out>, argv=<optimized
out>, envp=<optimized out>) at /build/buildd/qemu-2.0.0+dfsg/vl.c:4482

On Qemu 1.6.2 the extra writes are happening only when 1 vCPU is used
because of an explicit conditional check in kvm_put_msrs(). With
larger number of vCPUs the extra writes do not occur and then TSC is
marked as stable. Since Qemu commit
f86746c263753cf7a7e4bdb8829c70272dfcf36c the TSC is written
unconditionally and hence with Qemu 2.0.0 TSC is marked as unstable
for any number of vCPUs.

It looks like these extra writes are not anticipated by the TSC
synchronization logic and cause off-by-n errors on the
nr_vcpus_matched_tsc counter. If there was no overflow of that
variable the TSC would be marked as stable for kvmclock. It seems that
current logic works in a way that TSC writes which occur on all vCPUs
approximately at the same time (within threshold) on a system with a
stable TSC will cause a common TSC offset to be used and
PVCLOCK_TSC_STABLE_BIT to be set. Perhaps this logic could be adjusted
to tolerate these extra TSC writes so that if they match, TSC will be
marked as stable for kvmclock.

Before going forward with any fixes I would like to know what are your
thoughts about this.

--
Tomek
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux