Re: [RFC PATCH 1/2] timekeeper: change interface of clocksource reding functions

Denis Plotnikov <dplotnikov@xxxxxxxxxxxxx> · Fri, 21 Jul 2017 17:00:11 +0300

On 10.07.2017 16:00, Radim Krčmář wrote:
2017-06-28 13:55+0300, Denis Plotnikov:
When using timekeepeing API in some cases it is useful to return
timestamp value used along with the time calculated to use that
timestamp value for other purpuses (e.g. in KVM master clock timestamp)

Makes sense.  What I don't like about this interface is the TSC centric
approach in a code that isn't even x86 specific -- other architectures
might have a similar counter they'd like to use.

Could we get it a bit more generic?
Yes, sure

At least returning the type of the clock and its counter value.
(kvmclock is a bit problematic for the generic solution, because it is
  TSC based, but should pass through the kvmclock value if we were going
  to make it clean ...)

---
Actually, we might be overengineering it.  With the master clock we have
now, I'm thinking that the gtod is not crucial and we could as well be
using something like:

   static bool kvm_get_time_and_clockread(s64 *kernel_ns, u64 *cycle_now)
   {
   	local_irq_save(flags);
   	*cycle_now = rdtsc();
   	*kernel_ns = ktime_get_boot_ns();
   	local_irq_restore(flags);

   	return todo_tsc_hardware_is_good_enough();
   }

because kvmclock is shifted when master clock is recomputed or disabled,
so it should never happen. (And if it happens, then this bit of
imprecision doesn't matter much.)
The host doesn't need to use TSC either, because master clock uses TSC
frequency that differs from boot_ns -- we only need to be sure that the
TSC hardware doesn't change its frequency.

The small delta introduced with this (rdtsc <-> ktime_get_boot_ns) is
the same that the guest would see if we didn't enable master clock.
kvmclock updates are then normally using the same cycle_now/kernel_ns
pair to remain stable.

I thought about that (and actually tried that first). The concern is 
about L2 guests running over L1 KVM where the margin between cycle_now 
and kernel_ns is unpredictable because L1 KVM can be scheduled at any 
time for relatively long period.
That's why I decided to rework timekeeper and get kernel_time
and corresponding cycle value from there

--
Best,
Denis