On 13.03.24 15:06, David Woodhouse wrote: > On Wed, 2024-03-13 at 13:58 +0100, Alexandre Belloni wrote: >> The TSC or whatever CPU counter/clock that is used to keep the system >> time is not an RTC, I don't get why it has to be exposed as such to the >> guests. PTP is fine and precise, RTC is not. > > Ah, I see. But the point of the virtio_rtc is not really to expose that > CPU counter. The point is to report the wallclock time, just like an > actual RTC. The real difference is the *precision*. > > The virtio_rtc device has a facility to *also* expose the counter, > because that's what we actually need to gain that precision... > > Applications don't read the RTC every time they want to know what the > time is. These days, they don't even make a system call; it's done > entirely in userspace mode. The kernel exposes some shared memory, > essentially saying "the counter was X at time Y, and runs at Z Hz". > Then applications just read the CPU counter and do some arithmetic. > > As we require more and more precision in the calibration, it becomes > important to get *paired* readings of the CPU counter and the wallclock > time at precisely the same moment. If the guest has to read one and > then the other, potentially taking interrupts, getting preempted and > suffering steal/SMI time in the middle, that introduces an error which > is increasingly significant as we increasingly care about precision. > > Peter's proposal exposes the pairs of {X,Y} and leaves *all* the guest > kernels having to repeat readings over time and perform the calibration > as the underlying hardware oscillator frequency (Z) drifts with > temperature. I'm trying to get him to let the hypervisor expose the > calibrated frequency Z too. Along with *error* bounds for ±δX and ±δZ. > Which aside from reducing the duplication of effort, will *also* fix > the problem of live migration where *all* those things suffer a step > change and leave the guest with an inaccurate clock but not knowing it. I am already convinced that this would work significantly better than the {X,Y} pair (but would be a bit more effort to implement): 1. when accessed by user space, obviously 2. when backing the PTP clock, it saves CPU time and makes non-paired reads more precise. I would just prefer to try upstreaming the {X,Y} pairing first. I think the {X,Y,Z...} pairing could be discussed and developed in parallel. Thanks for the comments, Peter