Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support

David Woodhouse <dwmw2@xxxxxxxxxxxxx> · Fri, 28 Jun 2024 13:15:47 +0100

On Fri, 2024-06-28 at 13:33 +0200, Peter Hilber wrote:
> On 27.06.24 16:52, David Woodhouse wrote:
> > I already added a flags field, so this might look something like:
> > 
> >         /*
> >          * Smearing flags. The UTC clock exposed through this structure
> >          * is only ever true UTC, but a guest operating system may
> >          * choose to offer a monotonic smeared clock to its users. This
> >          * merely offers a hint about what kind of smearing to perform,
> >          * for consistency with systems in the nearby environment.
> >          */
> > #define VMCLOCK_FLAGS_SMEAR_UTC_SLS (1<<5) /* draft-kuhn-leapsecond-00.txt */
> > 
> > (UTC-SLS is probably a bad example but are there formal definitions for
> > anything else?)
> 
> I think it could also be more generic, like flags for linear smearing,
> cosine smearing(?), and smear_start_sec and smear_end_sec fields (relative
> to the leap second start). That could also represent UTC-SLS, and
> noon-to-noon, and it would be well-defined.
> 
> This should reduce the likelihood that the guest doesn't know the smearing
> variant.

I'm wary of making it too generic. That would seem to encourage a
*proliferation* of false "UTC-like" clocks.

It's bad enough that we do smearing at all, let alone that we don't
have a single definition of how to do it.

I made the smearing hint a full uint8_t instead of using bits in flags,
in the end. That gives us a full 255 ways of lying to users about what
the time is, so we're unlikely to run out. And it's easy enough to add
a new VMCLOCK_SMEARING_XXX type to the 'registry' for any new methods
that get invented.

> > > > +       /*
> > > > +        * This field changes to another non-repeating value when the CPU
> > > > +        * counter is disrupted, for example on live migration.
> > > > +        */
> > > > +       uint64_t disruption_marker;
> > > 
> > > The field could also change when the clock is stepped (leap seconds
> > > excepted), or when the clock frequency is slewed.
> > 
> > I'm not sure. The concept of the disruption marker is that it tells the
> > guest to throw away any calibration of the counter that the guest has
> > done for *itself* (with NTP, other PTP devices, etc.).
> > 
> > One mode for this device would be not to populate the clock fields at
> > all, but *only* to signal disruption when it occurs. So the guest can
> > abort transactions until it's resynced its clocks (to avoid incurring
> > fines if breaking databases, etc.).
> > 
> > Exposing the host timekeeping through the structure means that the
> > migrated guest can keep working because it can trust the timekeeping
> > performed by the (new) host and exposed to it.
> > 
> > If the counter is actually varying in frequency over time, and the host
> > is slewing the clock frequency that it reports, that *isn't* a step
> > change and doesn't mean that the guest should throw away any
> > calibration that it's been doing for itself. One hopes that the guest
> > would have detected the *same* frequency change, and be adapting for
> > itself. So I don't think that should indicate a disruption.
> > 
> > I think the same is even true if the clock is stepped by the host. The
> > actual *counter* hasn't changed, so the guest is better off ignoring
> > the vacillating host and continuing to derive its idea of time from the
> > hardware counter itself, as calibrated against some external NTP/PTP
> > sources. Surely we actively *don't* to tell the guest to throw its own
> > calibrations away, in this case?
> 
> In case the guest is also considering other time sources, it might indeed
> not be a good idea to mix host clock changes into the hardware counter
> disruption marker.
> 
> But if the vmclock is the authoritative source of time, it can still be
> helpful to know about such changes, maybe through another marker.

Could that be the existing seq_count field?

Skewing the counter_period_frac_sec as the underlying oscillator speeds
up and slows down is perfectly normal and expected, and we already
expect the seq_count to change when that happens.

Maybe step changes are different, but arguably if the time advertised
by the host steps *outside* the error bounds previously advertised,
that's just broken?

Depending on how the clock information is fed, a change in seq_count
may even result in non-monotonicity. If the underlying oscillator has
sped up and the structure is updated accordingly, the time calculated
the moment *before* that update may appear later than the time
calculated immediately after it.

It's up to the guest operating system to feed that information into its
own timekeeping system and skew towards correctness instead of stepping
the time it reports to its users.

<<attachment: smime.p7s>>