On Wed, 2024-07-03 at 11:56 +0200, Peter Hilber wrote: > On 02.07.24 20:40, David Woodhouse wrote: > > On 2 July 2024 19:12:00 BST, Peter Hilber <peter.hilber@xxxxxxxxxxxxxxx> wrote: > > > On 02.07.24 18:39, David Woodhouse wrote: > > > > To clarify then, the main types are > > > > > > > > VIRTIO_RTC_CLOCK_UTC == 0 > > > > VIRTIO_RTC_CLOCK_TAI == 1 > > > > VIRTIO_RTC_CLOCK_MONOTONIC == 2 > > > > VIRTIO_RTC_CLOCK_SMEARED_UTC == 3 > > > > > > > > And the subtypes are *only* for the case of > > > > VIRTIO_RTC_CLOCK_SMEARED_UTC. They include > > > > > > > > VIRTIO_RTC_SUBTYPE_STRICT > > > > VIRTIO_RTC_SUBTYPE_UNDEFINED /* or whatever you want to call it */ > > > > VIRTIO_RTC_SUBTYPE_SMEAR_NOON_LINEAR > > > > VIRTIO_RTC_SUBTYPE_UTC_SLS /* if it's worth doing this one */ > > > > > > > > Is that what we just agreed on? > > > > > > > > > > > > > > This is a misunderstanding. My idea was that the main types are > > > > > > > VIRTIO_RTC_CLOCK_UTC == 0 > > > > VIRTIO_RTC_CLOCK_TAI == 1 > > > > VIRTIO_RTC_CLOCK_MONOTONIC == 2 > > > > VIRTIO_RTC_CLOCK_SMEARED_UTC == 3 > > > > > > VIRTIO_RTC_CLOCK_MAYBE_SMEARED_UTC == 4 > > > > > > The subtypes would be (1st for clocks other than > > > VIRTIO_RTC_CLOCK_SMEARED_UTC, 2nd to last for > > > VIRTIO_RTC_CLOCK_SMEARED_UTC): > > > > > > #define VIRTIO_RTC_SUBTYPE_STRICT 0 > > > #define VIRTIO_RTC_SUBTYPE_SMEAR_NOON_LINEAR 1 > > > #define VIRTIO_RTC_SUBTYPE_SMEAR_UTC_SLS 2 > > > > > > > Thanks. I really do think that from the guest point of view there's > > really no distinction between "maybe smeared" and "undefined > > smearing", and have a preference for using the latter form, which > > is the key difference there? > > > > Again though, not a hill for me to die on. > > I have no issue with staying with "undefined smearing", so would you agree > to something like > > VIRTIO_RTC_CLOCK_SMEAR_UNDEFINED_UTC == 4 > > (or another name if you prefer)? Well, the point of contention was really whether that was a *type* or a *subtype*. Either way, it's a "precision clock" telling its consumer that the device *itself* doesn't really know what time is being exposed. Which seems like a bizarre thing to support. But I think I've constructed an argument which persuades me to your point of view that *if* we permit it, it should be a primary type... A clock can *either* be UTC, *or* it can be monotonic. The whole point of smearing is to produce a monotonic clock, of course. VIRTIO_RTC_CLOCK_UTC is UTC. It is not monotonic. VIRTIO_RTC_CLOCK_SMEARED is, presumably, monotonic (and I think we should explicitly require that to be true in virtio-rtc). But VIRTIO_RTC_CLOCK_MAYBE_SMEARED is the worst of both worlds. It is neither known to be correct UTC, *nor* is it known to be monotonic. So (again, if we permit it at all) I think it probably does make sense for that to be a primary type. This is what I currently have for 'struct vmclock_abi' that I'd like to persuade you to adopt. I need to tweak it some more, for at least the following reasons, as well as any more you can see: • size isn't big enough for 64KiB pages • Should be explicitly little-endian • Does it need esterror as well as maxerror? • Why is maxerror in picoseconds? It's the only use of that unit • Where do the clock_status values come from? Do they make sense? • Are signed integers OK? (I think so!). /* * This structure provides a vDSO-style clock to VM guests, exposing the * relationship (or lack thereof) between the CPU clock (TSC, timebase, arch * counter, etc.) and real time. It is designed to address the problem of * live migration, which other clock enlightenments do not. * * When a guest is live migrated, this affects the clock in two ways. * * First, even between identical hosts the actual frequency of the underlying * counter will change within the tolerances of its specification (typically * ±50PPM, or 4 seconds a day). This frequency also varies over time on the * same host, but can be tracked by NTP as it generally varies slowly. With * live migration there is a step change in the frequency, with no warning. * * Second, there may be a step change in the value of the counter itself, as * its accuracy is limited by the precision of the NTP synchronization on the * source and destination hosts. * * So any calibration (NTP, PTP, etc.) which the guest has done on the source * host before migration is invalid, and needs to be redone on the new host. * * In its most basic mode, this structure provides only an indication to the * guest that live migration has occurred. This allows the guest to know that * its clock is invalid and take remedial action. For applications that need * reliable accurate timestamps (e.g. distributed databases), the structure * can be mapped all the way to userspace. This allows the application to see * directly for itself that the clock is disrupted and take appropriate * action, even when using a vDSO-style method to get the time instead of a * system call. * * In its more advanced mode. this structure can also be used to expose the * precise relationship of the CPU counter to real time, as calibrated by the * host. This means that userspace applications can have accurate time * immediately after live migration, rather than having to pause operations * and wait for NTP to recover. This mode does, of course, rely on the * counter being reliable and consistent across CPUs. * * Note that this must be true UTC, never with smeared leap seconds. If a * guest wishes to construct a smeared clock, it can do so. Presenting a * smeared clock through this interface would be problematic because it * actually messes with the apparent counter *period*. A linear smearing * of 1 ms per second would effectively tweak the counter period by 1000PPM * at the start/end of the smearing period, while a sinusoidal smear would * basically be impossible to represent. * * This structure is offered with the intent that it be adopted into the * nascent virtio-rtc standard, as a virtio-rtc that does not address the live * migration problem seems a little less than fit for purpose. For that * reason, certain fields use precisely the same numeric definitions as in * the virtio-rtc proposal. The structure can also be exposed through an ACPI * device with the CID "VMCLOCK", modelled on the "VMGENID" device except for * the fact that it uses a real _CRS to convey the address of the structure * (which should be a full page, to allow for mapping directly to userspace). */ #ifndef __VMCLOCK_ABI_H__ #define __VMCLOCK_ABI_H__ #ifdef __KERNEL__ #include <linux/types.h> #else #include <stdint.h> #endif struct vmclock_abi { uint64_t magic; #define VMCLOCK_MAGIC 0x4b4c4356 /* "VCLK" */ uint16_t size; /* Size of page containing this structure */ uint16_t version; /* 1 */ /* Sequence lock. Low bit means an update is in progress. */ uint32_t seq_count; uint32_t flags; /* Indicates that the tai_offset_sec field is valid */ #define VMCLOCK_FLAG_TAI_OFFSET_VALID (1 << 0) /* * Optionally used to notify guests of pending maintenance events. * A guest may wish to remove itself from service if an event is * coming up. Two flags indicate the rough imminence of the event. */ #define VMCLOCK_FLAG_DISRUPTION_SOON (1 << 1) /* About a day */ #define VMCLOCK_FLAG_DISRUPTION_IMMINENT (1 << 2) /* About an hour */ /* Indicates that the utc_time_maxerror_picosec field is valid */ #define VMCLOCK_FLAG_UTC_MAXERROR_VALID (1 << 3) /* Indicates counter_period_error_rate_frac_sec is valid */ #define VMCLOCK_FLAG_PERIOD_ERROR_VALID (1 << 4) /* * This field changes to another non-repeating value when the CPU * counter is disrupted, for example on live migration. This lets * the guest know that it should discard any calibration it has * performed of the counter against external sources (NTP/PTP/etc.). */ uint64_t disruption_marker; uint8_t clock_status; #define VMCLOCK_STATUS_UNKNOWN 0 #define VMCLOCK_STATUS_INITIALIZING 1 #define VMCLOCK_STATUS_SYNCHRONIZED 2 #define VMCLOCK_STATUS_FREERUNNING 3 #define VMCLOCK_STATUS_UNRELIABLE 4 uint8_t counter_id; /* Matches VIRTIO_RTC_COUNTER_xxx */ #define VMCLOCK_COUNTER_ARM_VCNT 0 #define VMCLOCK_COUNTER_X86_TSC 1 #define VMCLOCK_COUNTER_INVALID 0xff /* * By providing the offset from UTC to TAI, the guest can know both * UTC and TAI reliably, whichever is indicated in the time_type * field. Valid if VMCLOCK_FLAG_TAI_OFFSET_VALID is set in flags. */ int16_t tai_offset_sec; /* * What time is exposed in the time_sec/time_frac_sec fields? */ uint8_t time_type; /* Matches VIRTIO_RTC_TYPE_xxx */ #define VMCLOCK_TIME_UTC 0 /* Since 1970-01-01 00:00:00z */ #define VMCLOCK_TIME_TAI 1 /* Since 1970-01-01 00:00:00z */ #define VMCLOCK_TIME_MONOTONIC 2 /* Since undefined epoch */ #define VMCLOCK_TIME_INVALID_SMEARED 3 /* Not supported */ #define VMCLOCK_TIME_INVALID_MAYBE_SMEARED 4 /* Not supported */ /* * The time exposed through this device is never smeared. This field * corresponds to the 'subtype' field in virtio-rtc, which indicates * the smearing method. However in this case it provides a *hint* to * the guest operating system, such that *if* the guest OS wants to * provide its users with an alternative clock which does not follow * the POSIX CLOCK_REALTIME standard, it may do so in a fashion * consistent with the other systems in the nearby environment. */ uint8_t leap_second_smearing_hint; /* Matches VIRTIO_RTC_SUBTYPE_xxx */ #define VMCLOCK_SMEARING_STRICT 0 #define VMCLOCK_SMEARING_NOON_LINEAR 1 #define VMCLOCK_SMEARING_UTC_SLS 2 /* Bit shift for counter_period_frac_sec and its error rate */ uint8_t counter_period_shift; /* * Unlike in NTP, this can indicate a leap second in the past. This * is needed to allow guests to derive an imprecise clock with * smeared leap seconds for themselves, as some modes of smearing * need the adjustments to continue even after the moment at which * the leap second should have occurred. */ uint8_t leap_indicator; /* Matches VIRTIO_RTC_LEAP_xxx */ #define VMCLOCK_LEAP_NONE 0 #define VMCLOCK_LEAP_PRE_POS 1 #define VMCLOCK_LEAP_PRE_NEG 2 #define VMCLOCK_LEAP_POS 3 #define VMCLOCK_LEAP_NEG 4 uint64_t leapsecond_tai_sec; /* Since 1970-01-01 00:00:00z */ /* * Paired values of counter and UTC at a given point in time. */ uint64_t counter_value; uint64_t time_sec; uint64_t time_frac_sec; /* * Counter frequency, and error margin. The unit of these fields is * seconds >> (64 + counter_period_shift) */ uint64_t counter_period_frac_sec; uint64_t counter_period_error_rate_frac_sec; /* Error margin of UTC reading above (± picoseconds) */ uint64_t utc_time_maxerror_picosec; }; #endif /* __VMCLOCK_ABI_H__ */
<<attachment: smime.p7s>>