On Thu, 2024-06-20 at 14:37 +0200, Peter Hilber wrote: > Changing virtio-dev address to the new one. The discussion might also be > relevant for virtio-comment, but it is discouraged to forward it to both. I will happily take it to whichever forum you think is most appropriate. (And you have my permission to direct replies to whichever you choose.) > On 15.06.24 10:40, David Woodhouse wrote: > > As discussed before, I don't think it makes sense to design a new high- > > precision virtual clock which only gets it right *most* of the time. We > > absolutely need to address the issue of live migration. ... > > Here's a first attempt at defining such a memory structure. For now > > I've done it as a "vmclock" ACPI device based loosely on vmgenid, but I > > think it makes most sense for this to be part of the virtio_rtc spec. > > Ultimately it doesn't matter *how* the guest finds the memory region. > > This looks sensible to me. I also think it would be possible to adapt this for > the virtio-rtc spec. The proposal also supports some other use cases which are > not in the virtio-rtc RFC spec v4 [2], notably vDSO-style clock reading, and > others such as indication of a past leap second. Right. The vDSO-style clock reading is key to solving the live migration problem. The other key thing this adds is the error bounds, which some applications care deeply about. I've been working with the team that owns ClockBound on that part: https://github.com/aws/clock-bound > Compared to the virtio-rtc RFC spec v4 [2], this proposal does not seem to > support leap second smearing. That's kind of intentional. Leap second smearing should be considered the *antithesis* of precise time. Anyone who wants a monotonic realtime clock should be using the POSIX CLOCK_TAI. Part of my motivation for fixing the LM problem is because some financial institutions can incur significant penalties for putting inaccurate timestamps on transactions — even the disruption caused by live migration is enough to trigger that. So deliberately lying to them about what the UTC time is, by up to a second in either direction, is not necessarily in their best interest. As you noted, this proposal does expose leap seconds in the recent past, which is all that's needed to allow a guest to generate a smeared clock *from* the accurate clock that is provided through this mechanism. (Knowledge of past leap seconds is needed because in some modes, smearing adjustments continue for some hours *afte* the leap second should have occurred. So the NTP style of leap indicator isn't sufficient). > Also, it would be helpful to allow indicating > when some of the fields are not valid (such as leapsecond_counter_value, > leapsecond_tai_time, tai_offset_sec, utc_time_maxerror_picosec, ...). Right. For some of those the answer can just be 'zero means invalid', for the max error, perhaps MAX_UINT64. But we should definitely make that explicit. I'm also not entirely sure I understood Julien's insistence that we include the leapsecond_counter_value as *well* as the leapsecond_tai_time. It seems to me that the implementation would have to recalculate that every time the frequency is adjusted. For some of those fields, I was expecting a certain amount of bikeshedding to occur and figured it was better to post an early straw man and solicit feedback. > Do you have plans to contribute this to the Virtio specification and Linux > driver implementation? Yes, absolutely. For now I've implemented it in the Linux guest¹ and in QEMU² as an ACPI device modelled on vmgenid, but I'd love *not* to have to do that, and just to do it based on virtio instead. ¹ https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/vmclock ² https://git.infradead.org/users/dwmw2/qemu.git/shortlog/refs/heads/vmclock > > +static const struct ptp_clock_info ptp_vmclock_info = { > > + .owner = THIS_MODULE, > > + .max_adj = 0, > > + .n_ext_ts = 0, > > + .n_pins = 0, > > + .pps = 0, > > + .adjfine = ptp_vmclock_adjfine, > > + .adjtime = ptp_vmclock_adjtime, > > + .gettime64 = ptp_vmclock_gettime, > > Should implement .gettimex64 instead. Ack, thanks. I'll go play with that. > > > + > > + /* Counter frequency, and error margin. Units of (second >> 64) */ > > + uint64_t counter_period_frac_sec; > > AFAIU this might limit the precision in case of high counter frequencies. > Could the unit be aligned to the expected frequency band of counters? This field indicates the period of a single tick, in units of 1>>64 of a second. That's about 5.4e-20 seconds, or 54 zeptoseconds? Can you walk me through a calculation where you believe that level of precision is insufficient? I guess the precision matters if the structure isn't updated for a long period of time, and the delta between the current counter and the snapshot is high? That's a *lot* of 54 zeptosecondses? But you really would need a *lot* of them before you care? And if nobody's been calibrating your counter for that long, surely you have bigger worries? Am I missing something there?
<<attachment: smime.p7s>>