On Wed, Jun 18, 2014 at 01:38:09PM +0200, Tomasz Grabiec wrote: > Hi, > > I'm working on OSv (https://github.com/cloudius-systems/osv), a guest > operating system. Right, please add a "tsc_matched" field to kvm_vcpu_arch (to keep track per-vcpu) and then increment the global counter only once per vcpu. > I've been investigating a phenomena of KVM_SYSTEM_TIME being marked as > unstable (PVCLOCK_TSC_STABLE_BIT cleared) by KVM on a modern > single-socket CPU since the very beginning of guest's life time. > According to the ABI, when pvclock_vcpu_time_info.flags does not have > the PVCLOCK_TSC_STABLE_BIT set we must not assume monotonicity and > need to compensate for this using atomic cmpxchg. However we would > like to avoid that cost unless necessary and since modern > single-socket CPUs have a stable TSC this cost seems unnecessary. > > Setups tested: > > Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz, kernel 3.13.0-29-generic > (Ubuntu), Qemu 2.0.0 > Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz, kernel 3.14.7-200.fc20.x86_64 > (Fedora), Qemu 1.6.2 > > Both CPUs have constant_tsc and nonstop_tsc flags and the host kernel > is using TSC as the master clock source. > > Here's what tracing KVM during QEMU (v2.0.0) startup shows. I used > `trace-cmd record -e kvm` and then `trace-cmd report | egrep > '(tsc|clock)'`. > > === 1 vCPU === > > qemu-system-x86-1353 [003] 75446.815273: kvm_update_master_clock: > masterclock 0 hostclock tsc offsetmatched 0 > qemu-system-x86-1355 [002] 75446.849021: kvm_write_tsc_offset: > vcpu=0 prev=0 next=18446556065789505232 > qemu-system-x86-1355 [002] 75446.849024: kvm_track_tsc: > vcpu_id 0 masterclock 0 offsetmatched 0 nr_online 1 hostclock tsc > qemu-system-x86-1353 [000] 75446.905313: kvm_write_tsc_offset: > vcpu=0 prev=18446556065789505232 next=18446556065789505232 > qemu-system-x86-1353 [000] 75446.905315: kvm_track_tsc: > vcpu_id 0 masterclock 0 offsetmatched 1 nr_online 1 hostclock tsc > qemu-system-x86-1353 [000] 75446.925064: kvm_write_tsc_offset: > vcpu=0 prev=18446556065789505232 next=18446556065789505232 > qemu-system-x86-1353 [000] 75446.925065: kvm_track_tsc: > vcpu_id 0 masterclock 0 offsetmatched 2 nr_online 1 hostclock tsc > qemu-system-x86-1353 [000] 75446.925119: kvm_update_master_clock: > masterclock 0 hostclock tsc offsetmatched 0 > qemu-system-x86-1355 [000] 75446.932931: kvm_update_master_clock: > masterclock 0 hostclock tsc offsetmatched 0 > > === 2 vCPUs === > > qemu-system-x86-1431 [003] 75539.014452: kvm_update_master_clock: > masterclock 0 hostclock tsc offsetmatched 0 > qemu-system-x86-1433 [003] 75539.054169: kvm_write_tsc_offset: > vcpu=0 prev=0 next=18446555836061830054 > qemu-system-x86-1433 [003] 75539.054171: kvm_track_tsc: > vcpu_id 0 masterclock 0 offsetmatched 0 nr_online 1 hostclock tsc > qemu-system-x86-1434 [003] 75539.078266: kvm_write_tsc_offset: > vcpu=1 prev=0 next=18446555836061830054 > qemu-system-x86-1434 [003] 75539.078269: kvm_track_tsc: > vcpu_id 1 masterclock 0 offsetmatched 1 nr_online 2 hostclock tsc > qemu-system-x86-1431 [002] 75539.134695: kvm_write_tsc_offset: > vcpu=0 prev=18446555836061830054 next=18446555836061830054 > qemu-system-x86-1431 [002] 75539.134698: kvm_track_tsc: > vcpu_id 0 masterclock 0 offsetmatched 2 nr_online 2 hostclock tsc > qemu-system-x86-1431 [002] 75539.142465: kvm_write_tsc_offset: > vcpu=1 prev=18446555836061830054 next=18446555836061830054 > qemu-system-x86-1431 [002] 75539.142468: kvm_track_tsc: > vcpu_id 1 masterclock 0 offsetmatched 3 nr_online 2 hostclock tsc > qemu-system-x86-1431 [002] 75539.182614: kvm_write_tsc_offset: > vcpu=0 prev=18446555836061830054 next=18446555836061830054 > qemu-system-x86-1431 [002] 75539.182617: kvm_track_tsc: > vcpu_id 0 masterclock 0 offsetmatched 4 nr_online 2 hostclock tsc > qemu-system-x86-1431 [002] 75539.182758: kvm_write_tsc_offset: > vcpu=1 prev=18446555836061830054 next=18446555836061830054 > qemu-system-x86-1431 [002] 75539.182758: kvm_track_tsc: > vcpu_id 1 masterclock 0 offsetmatched 5 nr_online 2 hostclock tsc > qemu-system-x86-1431 [002] 75539.182840: kvm_update_master_clock: > masterclock 0 hostclock tsc offsetmatched 0 > qemu-system-x86-1433 [003] 75539.194415: kvm_update_master_clock: > masterclock 0 hostclock tsc offsetmatched 0 > qemu-system-x86-1434 [001] 75539.218634: kvm_update_master_clock: > masterclock 0 hostclock tsc offsetmatched 0 > > When "masterclock" is 0 it means the PVCLOCK_TSC_STABLE_BIT will not > be set. It is not set because offsets are supposedly not matched. > However according to the kvm_track_tsc tracepoints each TSC write is > matched. But there are 3 writes per-CPU which makes > ka->nr_vcpus_matched_tsc to become larger than kvm->online_vcpus which > makes the following check in pvclock_update_vm_gtod_copy() to fail: > > vcpus_matched = (ka->nr_vcpus_matched_tsc + 1 == > atomic_read(&kvm->online_vcpus)); > > I turns out that kvm_write_tsc() is called with the target TSC value > of 0 from these 3 places per each vCPU: > > The first one: > > 0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel] > 0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm] > 0xffffffffa04cfd6b : kvm_arch_vcpu_postcreate+0x4b/0x80 [kvm] > 0xffffffffa04b8188 : kvm_vm_ioctl+0x418/0x750 [kvm] > 0xffffffff811fd0f0 > 0xffffffff811fd331 (inexact) > 0xffffffff81121eb6 (inexact) > 0xffffffff81700869 (inexact) > > The second one: > > 0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel] > 0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm] > 0xffffffffa090610d : vmx_set_msr+0x29d/0x350 [kvm_intel] > 0xffffffffa04be83b : do_set_msr+0x3b/0x60 [kvm] > 0xffffffffa04c10a8 : msr_io+0xc8/0x160 [kvm] > 0xffffffffa04caeb6 : kvm_arch_vcpu_ioctl+0xc86/0x1060 [kvm] > 0xffffffffa04b6797 : kvm_vcpu_ioctl+0xc7/0x5a0 [kvm] > 0xffffffff811fd0f0 > 0xffffffff811fd331 (inexact) > 0xffffffff81121eb6 (inexact) > 0xffffffff81700869 (inexact) > > #0 kvm_vcpu_ioctl (cpu=0x55555637ac40, type=type@entry=0x4008ae89) at > /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1780 > #1 0x000055555585ed82 in kvm_put_msrs (cpu=cpu@entry=0x55555637ac40, > level=level@entry=0x3) at > /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1270 > #2 0x0000555555861edd in kvm_arch_put_registers (cpu=<optimized out>, > level=0x3) at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1909 > #3 0x000055555582b1a2 in kvm_cpu_synchronize_post_init > (cpu=cpu@entry=0x55555637ac40) at > /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1641 > #4 0x00005555557cef5a in cpu_synchronize_post_init > (cpu=0x55555637ac40) at > /build/buildd/qemu-2.0.0+dfsg/include/sysemu/kvm.h:330 > #5 cpu_synchronize_all_post_init () at /build/buildd/qemu-2.0.0+dfsg/cpus.c:521 > #6 0x00005555555f0b1d in main (argc=<optimized out>, argv=<optimized > out>, envp=<optimized out>) at /build/buildd/qemu-2.0.0+dfsg/vl.c:4390 > > The third one: > > 0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel] > 0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm] > 0xffffffffa090610d : vmx_set_msr+0x29d/0x350 [kvm_intel] > 0xffffffffa04be83b : do_set_msr+0x3b/0x60 [kvm] > 0xffffffffa04c10a8 : msr_io+0xc8/0x160 [kvm] > 0xffffffffa04caeb6 : kvm_arch_vcpu_ioctl+0xc86/0x1060 [kvm] > 0xffffffffa04b6797 : kvm_vcpu_ioctl+0xc7/0x5a0 [kvm] > 0xffffffff811fd0f0 > 0xffffffff811fd331 (inexact) > 0xffffffff81121eb6 (inexact) > 0xffffffff81700869 (inexact) > > #0 kvm_vcpu_ioctl (cpu=0x55555637ac40, type=type@entry=0x4008ae89) at > /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1780 > #1 0x000055555585ed82 in kvm_put_msrs (cpu=cpu@entry=0x55555637ac40, > level=level@entry=0x2) at > /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1270 > #2 0x0000555555861edd in kvm_arch_put_registers (cpu=<optimized out>, > level=0x2) at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1909 > #3 0x000055555582b152 in kvm_cpu_synchronize_post_reset > (cpu=cpu@entry=0x55555637ac40) at > /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1635 > #4 0x00005555557ceeda in cpu_synchronize_post_reset > (cpu=0x55555637ac40) at > /build/buildd/qemu-2.0.0+dfsg/include/sysemu/kvm.h:323 > #5 cpu_synchronize_all_post_reset () at > /build/buildd/qemu-2.0.0+dfsg/cpus.c:512 > #6 0x00005555555f0c5d in main (argc=<optimized out>, argv=<optimized > out>, envp=<optimized out>) at /build/buildd/qemu-2.0.0+dfsg/vl.c:4482 > > On Qemu 1.6.2 the extra writes are happening only when 1 vCPU is used > because of an explicit conditional check in kvm_put_msrs(). With > larger number of vCPUs the extra writes do not occur and then TSC is > marked as stable. Since Qemu commit > f86746c263753cf7a7e4bdb8829c70272dfcf36c the TSC is written > unconditionally and hence with Qemu 2.0.0 TSC is marked as unstable > for any number of vCPUs. > > It looks like these extra writes are not anticipated by the TSC > synchronization logic and cause off-by-n errors on the > nr_vcpus_matched_tsc counter. If there was no overflow of that > variable the TSC would be marked as stable for kvmclock. It seems that > current logic works in a way that TSC writes which occur on all vCPUs > approximately at the same time (within threshold) on a system with a > stable TSC will cause a common TSC offset to be used and > PVCLOCK_TSC_STABLE_BIT to be set. Perhaps this logic could be adjusted > to tolerate these extra TSC writes so that if they match, TSC will be > marked as stable for kvmclock. > > Before going forward with any fixes I would like to know what are your > thoughts about this. > > -- > Tomek > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html