Guest TSC clock offset

Tzvetomir Stoyanov <tz.stoyanov@xxxxxxxxx> · Wed, 2 Sep 2020 15:36:21 +0300

Hi all,
I did a quick research as follow up our plumbers tracing BoF, here is
what I found:

KVM maintains per VCPU TSC time offset and scaling ratio:
    vcpu->arch.tsc_offset
    vcpu->arch.l1_tsc_offset
    vcpu->arch.tsc_scaling_ratio

There is KVM ioctl for getting the current tsc frequency of the guest,
works per VCPU:
    KVM_GET_TSC_KHZ

There is a newly proposed patch set introducing new KVM ioctls for
getting the TSC offset, per VCPU:
    https://www.spinics.net/lists/kvm/msg220471.html

KVM ioctls are well documented:
    https://www.kernel.org/doc/Documentation/virtual/kvm/api.txt
 Each KVM ioctl has a scope: system, VM or VCPU. The VM and VCPU  ioctls can be
 executed only from the context of the task which created that VM /
VCPU. As ioctls for tsc
 offset and scaling are with VCPU scope, we cannot use them from
trace-cmd context.

There is a kvm_write_tsc_offset trace event, which can be used to
track the tsc offset.
I observed that the offset is quite stable, at least in a short period
- there are few small
adjustments. These offsets are set at the VM boot time, so we cannot
rely on the trace event
later. We can track it, just in case the offsets change during the trace.
 CPU-9094  [001]   800.866350: kvm_write_tsc_offset: vcpu=0 prev=0
next=18446742520796313836
 CPU-9095  [001]   800.867593: kvm_write_tsc_offset: vcpu=1 prev=0
next=18446742520796313836
 CPU-9094  [001]   800.903735: kvm_write_tsc_offset: vcpu=0
prev=18446742520796313836 next=18446742520796313836
 CPU-9095  [001]   800.903858: kvm_write_tsc_offset: vcpu=1
prev=18446742520796313836 next=18446742520796313836
 CPU-9094  [005]   800.916755: kvm_write_tsc_offset: vcpu=0
prev=18446742520796313836 next=18446742520796313836
 CPU-9095  [001]   800.916800: kvm_write_tsc_offset: vcpu=1
prev=18446742520796313836 next=18446742520796313836
 CPU-9164  [000]   802.612779: kvm_write_tsc_offset: vcpu=0 prev=0
next=18446742517485073862
 CPU-9164  [000]   802.629874: kvm_write_tsc_offset: vcpu=0
prev=18446742517485073862 next=18446742517485073862
 CPU-9164  [000]   802.632420: kvm_write_tsc_offset: vcpu=0
prev=18446742517485073862 next=18446742517485073862
 CPU-9246  [007]   813.272152: kvm_write_tsc_offset: vcpu=0 prev=0
next=18446742497274838435
 CPU-9246  [004]   813.289768: kvm_write_tsc_offset: vcpu=0
prev=18446742497274838435 next=18446742497274838435
 CPU-9246  [004]   813.292385: kvm_write_tsc_offset: vcpu=0
prev=18446742497274838435 next=18446742497274838435

There are entries in KVM debugfs for getting the offset and scaling,
per VM and per VCPU:
 /sys/kernel/debug/kvm/175263-14/vcpu0/tsc-offset
 /sys/kernel/debug/kvm/175263-14/vcpu0/tsc-scaling-ratio
Where 175263 is the PID of the quemu task running the VM, not sure what 14 is.
I noticed a small mismatch - the vcpu->arch.tsc_offset is u64 while the value in
vcpu0/tsc-offset is printed as %lld and appears negative: -1552913237780.
Seems that the best solution is to get the offset and the scaling from
here + may be
track the kvm_write_tsc_offset during the trace, just in case.

I have few concerns:
  The solution is KVM specific. Maybe we should keep the PTP-like algorithm as
  a fail-back option for all other cases ?
  This offset is applicable for x86-tsc trace clock source only. Should we force
  the user to use only this clock source for host-guest tracing, or to
allow using
  all trace clocks ? In case of a non x86-tsc trace clock, use the PTP
to calc the offset ?

-- 
Tzvetomir (Ceco) Stoyanov
VMware Open Source Technology Center