Re: [RFC PATCH v4] ptp: Add vDSO-style vmclock support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 08.07.24 11:27, David Woodhouse wrote:
> From: David Woodhouse <dwmw@xxxxxxxxxxxx>
> 
> The vmclock "device" provides a shared memory region with precision clock
> information. By using shared memory, it is safe across Live Migration.
> 
> Like the KVM PTP clock, this can convert TSC-based cross timestamps into
> KVM clock values. Unlike the KVM PTP clock, it does so only when such is
> actually helpful.
> 
> The memory region of the device is also exposed to userspace so it can be
> read or memory mapped by application which need reliable notification of
> clock disruptions.
> 
> Signed-off-by: David Woodhouse <dwmw@xxxxxxxxxxxx>

[...]

> +
> +struct vmclock_abi {
> +	/* CONSTANT FIELDS */
> +	uint32_t magic;
> +#define VMCLOCK_MAGIC	0x4b4c4356 /* "VCLK" */
> +	uint32_t size;		/* Size of region containing this structure */
> +	uint16_t version;	/* 1 */
> +	uint8_t counter_id; /* Matches VIRTIO_RTC_COUNTER_xxx except INVALID */
> +#define VMCLOCK_COUNTER_ARM_VCNT	0
> +#define VMCLOCK_COUNTER_X86_TSC		1
> +#define VMCLOCK_COUNTER_INVALID		0xff
> +	uint8_t time_type; /* Matches VIRTIO_RTC_TYPE_xxx */
> +#define VMCLOCK_TIME_UTC			0	/* Since 1970-01-01 00:00:00z */
> +#define VMCLOCK_TIME_TAI			1	/* Since 1970-01-01 00:00:00z */
> +#define VMCLOCK_TIME_MONOTONIC			2	/* Since undefined epoch */
> +#define VMCLOCK_TIME_INVALID_SMEARED		3	/* Not supported */
> +#define VMCLOCK_TIME_INVALID_MAYBE_SMEARED	4	/* Not supported */
> +
> +	/* NON-CONSTANT FIELDS PROTECTED BY SEQCOUNT LOCK */
> +	uint32_t seq_count;	/* Low bit means an update is in progress */
> +	/*
> +	 * This field changes to another non-repeating value when the CPU
> +	 * counter is disrupted, for example on live migration. This lets
> +	 * the guest know that it should discard any calibration it has
> +	 * performed of the counter against external sources (NTP/PTP/etc.).
> +	 */
> +	uint64_t disruption_marker;
> +	uint64_t flags;
> +	/* Indicates that the tai_offset_sec field is valid */
> +#define VMCLOCK_FLAG_TAI_OFFSET_VALID		(1 << 0)
> +	/*
> +	 * Optionally used to notify guests of pending maintenance events.
> +	 * A guest which provides latency-sensitive services may wish to
> +	 * remove itself from service if an event is coming up. Two flags
> +	 * indicate the approximate imminence of the event.
> +	 */
> +#define VMCLOCK_FLAG_DISRUPTION_SOON		(1 << 1) /* About a day */
> +#define VMCLOCK_FLAG_DISRUPTION_IMMINENT	(1 << 2) /* About an hour */
> +#define VMCLOCK_FLAG_PERIOD_ESTERROR_VALID	(1 << 3)
> +#define VMCLOCK_FLAG_PERIOD_MAXERROR_VALID	(1 << 4)
> +#define VMCLOCK_FLAG_TIME_ESTERROR_VALID	(1 << 5)
> +#define VMCLOCK_FLAG_TIME_MAXERROR_VALID	(1 << 6)
> +	/*
> +	 * Even regardless of leap seconds, the time presented through this
> +	 * mechanism may not be strictly monotonic. If the counter slows down
> +	 * and the host adapts to this discovery, the time calculated from
> +	 * the value of the counter immediately after an update to this
> +	 * structure, may appear to be *earlier* than a calculation just
> +	 * before the update (while the counter was believed to be running
> +	 * faster than it now is). A guest operating system will typically
> +	 * *skew* its own system clock back towards the reference clock
> +	 * exposed here, rather than following this clock directly. If,
> +	 * however, this structure is being populated from such a system
> +	 * clock which is already handled in such a fashion and the results
> +	 * *are* guaranteed to be monotonic, such monotonicity can be
> +	 * advertised by setting this bit.
> +	 */

I wonder if this might be difficult to define in a standard.

Is there a need to define device and driver behavior in more detail? What
would happen if e.g. the device first decides how to update the clock, but
is then slow to update the SHM?

> +#define VMCLOCK_FLAG_TIME_MONOTONIC		(1 << 7)
> +
> +	uint8_t pad[2];
> +	uint8_t clock_status;
> +#define VMCLOCK_STATUS_UNKNOWN		0
> +#define VMCLOCK_STATUS_INITIALIZING	1
> +#define VMCLOCK_STATUS_SYNCHRONIZED	2
> +#define VMCLOCK_STATUS_FREERUNNING	3
> +#define VMCLOCK_STATUS_UNRELIABLE	4
> +
> +	/*
> +	 * The time exposed through this device is never smeared. This field
> +	 * corresponds to the 'subtype' field in virtio-rtc, which indicates
> +	 * the smearing method. However in this case it provides a *hint* to
> +	 * the guest operating system, such that *if* the guest OS wants to
> +	 * provide its users with an alternative clock which does not follow
> +	 * the POSIX CLOCK_REALTIME standard, it may do so in a fashion
> +	 * consistent with the other systems in the nearby environment.

AFAIU the POSIX.1-2017 standard does not mandate UTC, esp. not w.r.t.
leap seconds [1, A.4.16 Seconds Since the Epoch]:

> Those applications which do care about leap seconds can determine how to
> handle them in whatever way those applications feel is best. This was
> particularly emphasized because there was disagreement about what the best
> way of handling leap seconds might be. It is a practical impossibility to
> mandate that a conforming implementation must have a fixed relationship to
> any particular official clock (consider isolated systems, or systems
> performing "reruns" by setting the clock to some arbitrary time).

So the above comment should probably refer to UTC instead of POSIX
CLOCK_REALTIME.

> +	 */
> +	uint8_t leap_second_smearing_hint; /* Matches VIRTIO_RTC_SUBTYPE_xxx */
> +#define VMCLOCK_SMEARING_STRICT		0
> +#define VMCLOCK_SMEARING_NOON_LINEAR	1
> +#define VMCLOCK_SMEARING_UTC_SLS	2
> +	int16_t tai_offset_sec;
> +	uint8_t leap_indicator; /* Based on VIRTIO_RTC_LEAP_xxx */
> +#define VMCLOCK_LEAP_NONE	0	/* No known nearby leap second */
> +#define VMCLOCK_LEAP_PRE_POS	1	/* Leap second + at end of month */

A positive leap second usually means stepping the clock backwards, so 
`Leap second +` is somewhat confusing.

> +#define VMCLOCK_LEAP_PRE_NEG	2	/* Leap second - at end of month */
> +#define VMCLOCK_LEAP_POS	3	/* Set during 23:59:60 second */
> +#define VMCLOCK_LEAP_NEG	4	/* Not used in VMCLOCK */
> +	/*
> +	 * These values are not (yet) in virtio-rtc. They indicate that a
> +	 * leap second *has* occurred at the start of the month. This allows
> +	 * a guest to generate a smeared clock from the accurate clock which
> +	 * this device provides, as smearing may need to continue for up to a
> +	 * period of time *after* the point of the leap second itself. Must
> +	 * be cleared by the 15th day of the month.
> +	 */
> +#define VMCLOCK_LEAP_POST_POS	5
> +#define VMCLOCK_LEAP_POST_NEG	6

I think it can still be discussed in the context of virtio-rtc whether we
should add dedicated identifiers for message-based smeared clock readouts.


[1] https://pubs.opengroup.org/onlinepubs/9699919799/




[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux