Re: Fwd: KVM_SYSTEM_TIME clock is marked unstable on a modern single-socket system

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 18, 2014 at 01:38:09PM +0200, Tomasz Grabiec wrote:
> Hi,
> 
> I'm working on OSv (https://github.com/cloudius-systems/osv), a guest
> operating system.

Right, please add a "tsc_matched" field to kvm_vcpu_arch (to keep
track per-vcpu) and then increment the global counter only once
per vcpu.

> I've been investigating a phenomena of KVM_SYSTEM_TIME being marked as
> unstable (PVCLOCK_TSC_STABLE_BIT cleared) by KVM on a modern
> single-socket CPU since the very beginning of guest's life time.
> According to the ABI, when pvclock_vcpu_time_info.flags does not have
> the PVCLOCK_TSC_STABLE_BIT set we must not assume monotonicity and
> need to compensate for this using atomic cmpxchg. However we would
> like to avoid that cost unless necessary and since modern
> single-socket CPUs have a stable TSC this cost seems unnecessary.
> 
> Setups tested:
> 
> Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz, kernel 3.13.0-29-generic
> (Ubuntu), Qemu 2.0.0
> Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz, kernel 3.14.7-200.fc20.x86_64
> (Fedora), Qemu 1.6.2
> 
> Both CPUs have constant_tsc and nonstop_tsc flags and the host kernel
> is using TSC as the master clock source.
> 
> Here's what tracing KVM during QEMU (v2.0.0) startup shows. I used
> `trace-cmd record -e kvm` and then `trace-cmd report | egrep
> '(tsc|clock)'`.
> 
> === 1 vCPU ===
> 
>  qemu-system-x86-1353  [003] 75446.815273: kvm_update_master_clock:
> masterclock 0 hostclock tsc offsetmatched 0
>  qemu-system-x86-1355  [002] 75446.849021: kvm_write_tsc_offset:
> vcpu=0 prev=0 next=18446556065789505232
>  qemu-system-x86-1355  [002] 75446.849024: kvm_track_tsc:
> vcpu_id 0 masterclock 0 offsetmatched 0 nr_online 1 hostclock tsc
>  qemu-system-x86-1353  [000] 75446.905313: kvm_write_tsc_offset:
> vcpu=0 prev=18446556065789505232 next=18446556065789505232
>  qemu-system-x86-1353  [000] 75446.905315: kvm_track_tsc:
> vcpu_id 0 masterclock 0 offsetmatched 1 nr_online 1 hostclock tsc
>  qemu-system-x86-1353  [000] 75446.925064: kvm_write_tsc_offset:
> vcpu=0 prev=18446556065789505232 next=18446556065789505232
>  qemu-system-x86-1353  [000] 75446.925065: kvm_track_tsc:
> vcpu_id 0 masterclock 0 offsetmatched 2 nr_online 1 hostclock tsc
>  qemu-system-x86-1353  [000] 75446.925119: kvm_update_master_clock:
> masterclock 0 hostclock tsc offsetmatched 0
>  qemu-system-x86-1355  [000] 75446.932931: kvm_update_master_clock:
> masterclock 0 hostclock tsc offsetmatched 0
> 
> === 2 vCPUs ===
> 
>  qemu-system-x86-1431  [003] 75539.014452: kvm_update_master_clock:
> masterclock 0 hostclock tsc offsetmatched 0
>  qemu-system-x86-1433  [003] 75539.054169: kvm_write_tsc_offset:
> vcpu=0 prev=0 next=18446555836061830054
>  qemu-system-x86-1433  [003] 75539.054171: kvm_track_tsc:
> vcpu_id 0 masterclock 0 offsetmatched 0 nr_online 1 hostclock tsc
>  qemu-system-x86-1434  [003] 75539.078266: kvm_write_tsc_offset:
> vcpu=1 prev=0 next=18446555836061830054
>  qemu-system-x86-1434  [003] 75539.078269: kvm_track_tsc:
> vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc
>  qemu-system-x86-1431  [002] 75539.134695: kvm_write_tsc_offset:
> vcpu=0 prev=18446555836061830054 next=18446555836061830054
>  qemu-system-x86-1431  [002] 75539.134698: kvm_track_tsc:
> vcpu_id 0 masterclock 0 offsetmatched 2 nr_online 2 hostclock tsc
>  qemu-system-x86-1431  [002] 75539.142465: kvm_write_tsc_offset:
> vcpu=1 prev=18446555836061830054 next=18446555836061830054
>  qemu-system-x86-1431  [002] 75539.142468: kvm_track_tsc:
> vcpu_id 1 masterclock 0 offsetmatched 3 nr_online 2 hostclock tsc
>  qemu-system-x86-1431  [002] 75539.182614: kvm_write_tsc_offset:
> vcpu=0 prev=18446555836061830054 next=18446555836061830054
>  qemu-system-x86-1431  [002] 75539.182617: kvm_track_tsc:
> vcpu_id 0 masterclock 0 offsetmatched 4 nr_online 2 hostclock tsc
>  qemu-system-x86-1431  [002] 75539.182758: kvm_write_tsc_offset:
> vcpu=1 prev=18446555836061830054 next=18446555836061830054
>  qemu-system-x86-1431  [002] 75539.182758: kvm_track_tsc:
> vcpu_id 1 masterclock 0 offsetmatched 5 nr_online 2 hostclock tsc
>  qemu-system-x86-1431  [002] 75539.182840: kvm_update_master_clock:
> masterclock 0 hostclock tsc offsetmatched 0
>  qemu-system-x86-1433  [003] 75539.194415: kvm_update_master_clock:
> masterclock 0 hostclock tsc offsetmatched 0
>  qemu-system-x86-1434  [001] 75539.218634: kvm_update_master_clock:
> masterclock 0 hostclock tsc offsetmatched 0
> 
> When "masterclock" is 0 it means the PVCLOCK_TSC_STABLE_BIT will not
> be set. It is not set because offsets are supposedly not matched.
> However according to the kvm_track_tsc tracepoints each TSC write is
> matched. But there are 3 writes per-CPU which makes
> ka->nr_vcpus_matched_tsc to become larger than kvm->online_vcpus which
> makes the following check in pvclock_update_vm_gtod_copy() to fail:
> 
>    vcpus_matched =  (ka->nr_vcpus_matched_tsc + 1 ==
> atomic_read(&kvm->online_vcpus));
> 
> I turns out that kvm_write_tsc() is called with the target TSC value
> of 0 from these 3 places per each vCPU:
> 
> The first one:
> 
>  0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel]
>  0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm]
>  0xffffffffa04cfd6b : kvm_arch_vcpu_postcreate+0x4b/0x80 [kvm]
>  0xffffffffa04b8188 : kvm_vm_ioctl+0x418/0x750 [kvm]
>  0xffffffff811fd0f0
>  0xffffffff811fd331 (inexact)
>  0xffffffff81121eb6 (inexact)
>  0xffffffff81700869 (inexact)
> 
> The second one:
> 
>  0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel]
>  0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm]
>  0xffffffffa090610d : vmx_set_msr+0x29d/0x350 [kvm_intel]
>  0xffffffffa04be83b : do_set_msr+0x3b/0x60 [kvm]
>  0xffffffffa04c10a8 : msr_io+0xc8/0x160 [kvm]
>  0xffffffffa04caeb6 : kvm_arch_vcpu_ioctl+0xc86/0x1060 [kvm]
>  0xffffffffa04b6797 : kvm_vcpu_ioctl+0xc7/0x5a0 [kvm]
>  0xffffffff811fd0f0
>  0xffffffff811fd331 (inexact)
>  0xffffffff81121eb6 (inexact)
>  0xffffffff81700869 (inexact)
> 
> #0  kvm_vcpu_ioctl (cpu=0x55555637ac40, type=type@entry=0x4008ae89) at
> /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1780
> #1  0x000055555585ed82 in kvm_put_msrs (cpu=cpu@entry=0x55555637ac40,
> level=level@entry=0x3) at
> /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1270
> #2  0x0000555555861edd in kvm_arch_put_registers (cpu=<optimized out>,
> level=0x3) at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1909
> #3  0x000055555582b1a2 in kvm_cpu_synchronize_post_init
> (cpu=cpu@entry=0x55555637ac40) at
> /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1641
> #4  0x00005555557cef5a in cpu_synchronize_post_init
> (cpu=0x55555637ac40) at
> /build/buildd/qemu-2.0.0+dfsg/include/sysemu/kvm.h:330
> #5  cpu_synchronize_all_post_init () at /build/buildd/qemu-2.0.0+dfsg/cpus.c:521
> #6  0x00005555555f0b1d in main (argc=<optimized out>, argv=<optimized
> out>, envp=<optimized out>) at /build/buildd/qemu-2.0.0+dfsg/vl.c:4390
> 
> The third one:
> 
>  0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel]
>  0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm]
>  0xffffffffa090610d : vmx_set_msr+0x29d/0x350 [kvm_intel]
>  0xffffffffa04be83b : do_set_msr+0x3b/0x60 [kvm]
>  0xffffffffa04c10a8 : msr_io+0xc8/0x160 [kvm]
>  0xffffffffa04caeb6 : kvm_arch_vcpu_ioctl+0xc86/0x1060 [kvm]
>  0xffffffffa04b6797 : kvm_vcpu_ioctl+0xc7/0x5a0 [kvm]
>  0xffffffff811fd0f0
>  0xffffffff811fd331 (inexact)
>  0xffffffff81121eb6 (inexact)
>  0xffffffff81700869 (inexact)
> 
> #0  kvm_vcpu_ioctl (cpu=0x55555637ac40, type=type@entry=0x4008ae89) at
> /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1780
> #1  0x000055555585ed82 in kvm_put_msrs (cpu=cpu@entry=0x55555637ac40,
> level=level@entry=0x2) at
> /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1270
> #2  0x0000555555861edd in kvm_arch_put_registers (cpu=<optimized out>,
> level=0x2) at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1909
> #3  0x000055555582b152 in kvm_cpu_synchronize_post_reset
> (cpu=cpu@entry=0x55555637ac40) at
> /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1635
> #4  0x00005555557ceeda in cpu_synchronize_post_reset
> (cpu=0x55555637ac40) at
> /build/buildd/qemu-2.0.0+dfsg/include/sysemu/kvm.h:323
> #5  cpu_synchronize_all_post_reset () at
> /build/buildd/qemu-2.0.0+dfsg/cpus.c:512
> #6  0x00005555555f0c5d in main (argc=<optimized out>, argv=<optimized
> out>, envp=<optimized out>) at /build/buildd/qemu-2.0.0+dfsg/vl.c:4482
> 
> On Qemu 1.6.2 the extra writes are happening only when 1 vCPU is used
> because of an explicit conditional check in kvm_put_msrs(). With
> larger number of vCPUs the extra writes do not occur and then TSC is
> marked as stable. Since Qemu commit
> f86746c263753cf7a7e4bdb8829c70272dfcf36c the TSC is written
> unconditionally and hence with Qemu 2.0.0 TSC is marked as unstable
> for any number of vCPUs.
> 
> It looks like these extra writes are not anticipated by the TSC
> synchronization logic and cause off-by-n errors on the
> nr_vcpus_matched_tsc counter. If there was no overflow of that
> variable the TSC would be marked as stable for kvmclock. It seems that
> current logic works in a way that TSC writes which occur on all vCPUs
> approximately at the same time (within threshold) on a system with a
> stable TSC will cause a common TSC offset to be used and
> PVCLOCK_TSC_STABLE_BIT to be set. Perhaps this logic could be adjusted
> to tolerate these extra TSC writes so that if they match, TSC will be
> marked as stable for kvmclock.
> 
> Before going forward with any fixes I would like to know what are your
> thoughts about this.
> 
> --
> Tomek
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux