On Thu, Dec 28, 2023, at 13:24, Sagi Maimon wrote: > Some user space applications need to read some clocks. > Each read requires moving from user space to kernel space. > The syscall overhead causes unpredictable delay between N clocks reads > Removing this delay causes better synchronization between N clocks. > > Introduce a new system call multi_clock_gettime, which can be used to measure > the offset between multiple clocks, from variety of types: PHC, virtual PHC > and various system clocks (CLOCK_REALTIME, CLOCK_MONOTONIC, etc). > The offset includes the total time that the driver needs to read the clock > timestamp. > > New system call allows the reading of a list of clocks - up to PTP_MAX_CLOCKS. > Supported clocks IDs: PHC, virtual PHC and various system clocks. > Up to PTP_MAX_SAMPLES times (per clock) in a single system call read. > The system call returns n_clocks timestamps for each measurement: > - clock 0 timestamp > - ... > - clock n timestamp > > Signed-off-by: Sagi Maimon <maimon.sagi@xxxxxxxxx> Hi Sagi, Exposing an interface to read multiple clocks makes sense to me, but I wonder if the interface you use is too inflexible. > --- a/include/uapi/asm-generic/unistd.h > +++ b/include/uapi/asm-generic/unistd.h > @@ -828,9 +828,11 @@ __SYSCALL(__NR_futex_wake, sys_futex_wake) > __SYSCALL(__NR_futex_wait, sys_futex_wait) > #define __NR_futex_requeue 456 > __SYSCALL(__NR_futex_requeue, sys_futex_requeue) > +#define __NR_multi_clock_gettime 457 > +__SYSCALL(__NR_multi_clock_gettime, sys_multi_clock_gettime) > > #undef __NR_syscalls > -#define __NR_syscalls 457 > +#define __NR_syscalls 458 Site note: hooking it up only here is sufficient for the code review but not for inclusion: once we have an agreement on the API, this should be added to all architectures at once. > +#define MULTI_PTP_MAX_CLOCKS 5 /* Max number of clocks */ > +#define MULTI_PTP_MAX_SAMPLES 10 /* Max allowed offset measurement samples. */ > + > +struct __ptp_multi_clock_get { > + unsigned int n_clocks; /* Desired number of clocks. */ > + unsigned int n_samples; /* Desired number of measurements per clock. */ > + clockid_t clkid_arr[MULTI_PTP_MAX_CLOCKS]; /* list of clock IDs */ > + /* > + * Array of list of n_clocks clocks time samples n_samples times. > + */ > + struct __kernel_timespec ts[MULTI_PTP_MAX_SAMPLES][MULTI_PTP_MAX_CLOCKS]; > +}; The fixed size arrays here seem to be an unnecessary limitation, both MULTI_PTP_MAX_SAMPLES and MULTI_PTP_MAX_CLOCKS are small enough that one can come up with scenarios where you would want a higher number, but at the same time the structure is already 808 bytes long, which is more than you'd normally want to put on the kernel stack, and which may take a significant time to copy to and from userspace. Since n_clocks and n_samples are always inputs to the syscall, you can just pass them as register arguments and use a dynamically sized array instead. It's not clear to me what you gain from having the n_samples argument over just calling the syscall repeatedly. Does this offer a benefit for accuracy or is this just meant to avoid syscall overhead. > +SYSCALL_DEFINE1(multi_clock_gettime, struct __ptp_multi_clock_get > __user *, ptp_multi_clk_get) > +{ > + const struct k_clock *kc; > + struct timespec64 kernel_tp; > + struct __ptp_multi_clock_get multi_clk_get; > + unsigned int i, j; > + int error; > + > + if (copy_from_user(&multi_clk_get, ptp_multi_clk_get, > sizeof(multi_clk_get))) > + return -EFAULT; Here you copy the entire structure from userspace, but I don't actually see the .ts[] array on the stack being accessed later as you just copy to the user pointer directly. > + for (i = 0; i < multi_clk_get.n_clocks; i++) { > + kc = clockid_to_kclock(multi_clk_get.clkid_arr[i]); > + if (!kc) > + return -EINVAL; > + error = kc->clock_get_timespec(multi_clk_get.clkid_arr[i], > &kernel_tp); > + if (!error && put_timespec64(&kernel_tp, (struct __kernel_timespec > __user *) > + &ptp_multi_clk_get->ts[j][i])) > + error = -EFAULT; > + } The put_timespec64() and possibly the clockid_to_kclock() have some overhead that may introduce jitter, so it may be better to pull that out of the loop and have a fixed-size array of timespec64 values on the stack and then copy them at the end. On the other hand, this will still give less accuracy than the getcrosststamp() callback with ioctl(PTP_SYS_OFFSET_PRECISE), so either the last bit of accuracy isn't all that important, or you need to refine the interface to actually be an improvement over the chardev. Arnd