On Thu, Dec 14, 2023, David Woodhouse wrote: > From: David Woodhouse <dwmw@xxxxxxxxxxxx> > > A test program such as http://david.woodhou.se/timerlat.c confirms user > reports that timers are increasingly inaccurate as the lifetime of a > guest increases. Reporting the actual delay observed when asking for > 100µs of sleep, it starts off OK on a newly-launched guest but gets > worse over time, giving incorrect sleep times: > > root@ip-10-0-193-21:~# ./timerlat -c -n 5 > 00000000 latency 103243/100000 (3.2430%) > 00000001 latency 103243/100000 (3.2430%) > 00000002 latency 103242/100000 (3.2420%) > 00000003 latency 103245/100000 (3.2450%) > 00000004 latency 103245/100000 (3.2450%) > > The biggest problem is that get_kvmclock_ns() returns inaccurate values > when the guest TSC is scaled. The guest sees a TSC value scaled from the > host TSC by a mul/shift conversion (hopefully done in hardware). The > guest then converts that guest TSC value into nanoseconds using the > mul/shift conversion given to it by the KVM pvclock information. > > But get_kvmclock_ns() performs only a single conversion directly from > host TSC to nanoseconds, giving a different result. A test program at > http://david.woodhou.se/tsdrift.c demonstrates the cumulative error > over a day. > > It's non-trivial to fix get_kvmclock_ns(), although I'll come back to > that. The actual guest hv_clock is per-CPU, and *theoretically* each > vCPU could be running at a *different* frequency. But this patch is > needed anyway because... > > The other issue with Xen timers was that the code would snapshot the > host CLOCK_MONOTONIC at some point in time, and then... after a few > interrupts may have occurred, some preemption perhaps... would also read > the guest's kvmclock. Then it would proceed under the false assumption > that those two happened at the *same* time. Any time which *actually* > elapsed between reading the two clocks was introduced as inaccuracies > in the time at which the timer fired. > > Fix it to use a variant of kvm_get_time_and_clockread(), which reads the > host TSC just *once*, then use the returned TSC value to calculate the > kvmclock (making sure to do that the way the guest would instead of > making the same mistake get_kvmclock_ns() does). > > Sadly, hrtimers based on CLOCK_MONOTONIC_RAW are not supported, so Xen > timers still have to use CLOCK_MONOTONIC. In practice the difference > between the two won't matter over the timescales involved, as the > *absolute* values don't matter; just the delta. > > This does mean a new variant of kvm_get_time_and_clockread() is needed; > called kvm_get_monotonic_and_clockread() because that's what it does. > > Fixes: 536395260582 ("KVM: x86/xen: handle PV timers oneshot mode") > Signed-off-by: David Woodhouse <dwmw@xxxxxxxxxxxx> > --- Dagnabbit, this one is corrupt too :-/