Re: [RFC PATCH v3 0/7] Add virtio_rtc module and related changes

Peter Hilber <peter.hilber@xxxxxxxxxxxxxxx> · Wed, 13 Mar 2024 18:50:59 +0100

On 13.03.24 15:06, David Woodhouse wrote:
> On Wed, 2024-03-13 at 13:58 +0100, Alexandre Belloni wrote:
>> The TSC or whatever CPU counter/clock that is used to keep the system
>> time is not an RTC, I don't get why it has to be exposed as such to the
>> guests. PTP is fine and precise, RTC is not.
> 
> Ah, I see. But the point of the virtio_rtc is not really to expose that
> CPU counter. The point is to report the wallclock time, just like an
> actual RTC. The real difference is the *precision*.
> 
> The virtio_rtc device has a facility to *also* expose the counter,
> because that's what we actually need to gain that precision...
> 
> Applications don't read the RTC every time they want to know what the
> time is. These days, they don't even make a system call; it's done
> entirely in userspace mode. The kernel exposes some shared memory,
> essentially saying "the counter was X at time Y, and runs at Z Hz".
> Then applications just read the CPU counter and do some arithmetic.
> 
> As we require more and more precision in the calibration, it becomes
> important to get *paired* readings of the CPU counter and the wallclock
> time at precisely the same moment. If the guest has to read one and
> then the other, potentially taking interrupts, getting preempted and
> suffering steal/SMI time in the middle, that introduces an error which
> is increasingly significant as we increasingly care about precision.
> 
> Peter's proposal exposes the pairs of {X,Y} and leaves *all* the guest
> kernels having to repeat readings over time and perform the calibration
> as the underlying hardware oscillator frequency (Z) drifts with
> temperature. I'm trying to get him to let the hypervisor expose the
> calibrated frequency Z too. Along with *error* bounds for ±δX and ±δZ.
> Which aside from reducing the duplication of effort, will *also* fix
> the problem of live migration where *all* those things suffer a step
> change and leave the guest with an inaccurate clock but not knowing it.

I am already convinced that this would work significantly better than the
{X,Y} pair (but would be a bit more effort to implement):

1. when accessed by user space, obviously

2. when backing the PTP clock, it saves CPU time and makes non-paired
   reads more precise.

I would just prefer to try upstreaming the {X,Y} pairing first. I think the
{X,Y,Z...} pairing could be discussed and developed in parallel.

Thanks for the comments,

Peter