Re: [PATCH v3] posix-timers: add multi_clock_gettime system call

"Arnd Bergmann" <arnd@xxxxxxxx> · Fri, 29 Dec 2023 16:26:46 +0100

On Thu, Dec 28, 2023, at 13:24, Sagi Maimon wrote:
> Some user space applications need to read some clocks.
> Each read requires moving from user space to kernel space.
> The syscall overhead causes unpredictable delay between N clocks reads
> Removing this delay causes better synchronization between N clocks.
>
> Introduce a new system call multi_clock_gettime, which can be used to measure
> the offset between multiple clocks, from variety of types: PHC, virtual PHC
> and various system clocks (CLOCK_REALTIME, CLOCK_MONOTONIC, etc).
> The offset includes the total time that the driver needs to read the clock
> timestamp.
>
> New system call allows the reading of a list of clocks - up to PTP_MAX_CLOCKS.
> Supported clocks IDs: PHC, virtual PHC and various system clocks.
> Up to PTP_MAX_SAMPLES times (per clock) in a single system call read.
> The system call returns n_clocks timestamps for each measurement:
> - clock 0 timestamp
> - ...
> - clock n timestamp
>
> Signed-off-by: Sagi Maimon <maimon.sagi@xxxxxxxxx>

Hi Sagi,

Exposing an interface to read multiple clocks makes sense to me,
but I wonder if the interface you use is too inflexible.

> --- a/include/uapi/asm-generic/unistd.h
> +++ b/include/uapi/asm-generic/unistd.h
> @@ -828,9 +828,11 @@ __SYSCALL(__NR_futex_wake, sys_futex_wake)
>  __SYSCALL(__NR_futex_wait, sys_futex_wait)
>  #define __NR_futex_requeue 456
>  __SYSCALL(__NR_futex_requeue, sys_futex_requeue)
> +#define __NR_multi_clock_gettime 457
> +__SYSCALL(__NR_multi_clock_gettime, sys_multi_clock_gettime)
> 
>  #undef __NR_syscalls
> -#define __NR_syscalls 457
> +#define __NR_syscalls 458

Site note: hooking it up only here is sufficient for the
code review but not for inclusion: once we have an agreement
on the API, this should be added to all architectures at once.

> +#define MULTI_PTP_MAX_CLOCKS 5 /* Max number of clocks */
> +#define MULTI_PTP_MAX_SAMPLES 10 /* Max allowed offset measurement samples. */
> +
> +struct __ptp_multi_clock_get {
> +	unsigned int n_clocks; /* Desired number of clocks. */
> +	unsigned int n_samples; /* Desired number of measurements per clock. */
> +	clockid_t clkid_arr[MULTI_PTP_MAX_CLOCKS]; /* list of clock IDs */
> +	/*
> +	 * Array of list of n_clocks clocks time samples n_samples times.
> +	 */
> +	struct  __kernel_timespec ts[MULTI_PTP_MAX_SAMPLES][MULTI_PTP_MAX_CLOCKS];
> +};

The fixed size arrays here seem to be an unnecessary limitation,
both MULTI_PTP_MAX_SAMPLES and MULTI_PTP_MAX_CLOCKS are small
enough that one can come up with scenarios where you would want
a higher number, but at the same time the structure is already
808 bytes long, which is more than you'd normally want to put
on the kernel stack, and which may take a significant time to
copy to and from userspace.

Since n_clocks and n_samples are always inputs to the syscall,
you can just pass them as register arguments and use a dynamically
sized array instead.

It's not clear to me what you gain from having the n_samples
argument over just calling the syscall repeatedly. Does
this offer a benefit for accuracy or is this just meant to
avoid syscall overhead.
> +SYSCALL_DEFINE1(multi_clock_gettime, struct __ptp_multi_clock_get 
> __user *, ptp_multi_clk_get)
> +{
> +	const struct k_clock *kc;
> +	struct timespec64 kernel_tp;
> +	struct __ptp_multi_clock_get multi_clk_get;
> +	unsigned int i, j;
> +	int error;
> +
> +	if (copy_from_user(&multi_clk_get, ptp_multi_clk_get, 
> sizeof(multi_clk_get)))
> +		return -EFAULT;

Here you copy the entire structure from userspace, but
I don't actually see the .ts[] array on the stack being
accessed later as you just copy to the user pointer
directly.

> +		for (i = 0; i < multi_clk_get.n_clocks; i++) {
> +			kc = clockid_to_kclock(multi_clk_get.clkid_arr[i]);
> +			if (!kc)
> +				return -EINVAL;
> +			error = kc->clock_get_timespec(multi_clk_get.clkid_arr[i], 
> &kernel_tp);
> +			if (!error && put_timespec64(&kernel_tp, (struct __kernel_timespec 
> __user *)
> +						     &ptp_multi_clk_get->ts[j][i]))
> +				error = -EFAULT;
> +		}

The put_timespec64() and possibly the clockid_to_kclock() have
some overhead that may introduce jitter, so it may be better to
pull that out of the loop and have a fixed-size array
of timespec64 values on the stack and then copy them
at the end.

On the other hand, this will still give less accuracy than the
getcrosststamp() callback with ioctl(PTP_SYS_OFFSET_PRECISE),
so either the last bit of accuracy isn't all that important,
or you need to refine the interface to actually be an
improvement over the chardev.

      Arnd