Re: [PATCH v3] posix-timers: add multi_clock_gettime system call

Sagi Maimon <maimon.sagi@xxxxxxxxx> · Sun, 31 Dec 2023 18:00:17 +0200



On Fri, Dec 29, 2023 at 5:27 PM Arnd Bergmann <arnd@xxxxxxxx> wrote:
>
> On Thu, Dec 28, 2023, at 13:24, Sagi Maimon wrote:
> > Some user space applications need to read some clocks.
> > Each read requires moving from user space to kernel space.
> > The syscall overhead causes unpredictable delay between N clocks reads
> > Removing this delay causes better synchronization between N clocks.
> >
> > Introduce a new system call multi_clock_gettime, which can be used to measure
> > the offset between multiple clocks, from variety of types: PHC, virtual PHC
> > and various system clocks (CLOCK_REALTIME, CLOCK_MONOTONIC, etc).
> > The offset includes the total time that the driver needs to read the clock
> > timestamp.
> >
> > New system call allows the reading of a list of clocks - up to PTP_MAX_CLOCKS.
> > Supported clocks IDs: PHC, virtual PHC and various system clocks.
> > Up to PTP_MAX_SAMPLES times (per clock) in a single system call read.
> > The system call returns n_clocks timestamps for each measurement:
> > - clock 0 timestamp
> > - ...
> > - clock n timestamp
> >
> > Signed-off-by: Sagi Maimon <maimon.sagi@xxxxxxxxx>
>
> Hi Sagi,
>
> Exposing an interface to read multiple clocks makes sense to me,
> but I wonder if the interface you use is too inflexible.
>
Arnd - thanks alot for your notes.
> > --- a/include/uapi/asm-generic/unistd.h
> > +++ b/include/uapi/asm-generic/unistd.h
> > @@ -828,9 +828,11 @@ __SYSCALL(__NR_futex_wake, sys_futex_wake)
> >  __SYSCALL(__NR_futex_wait, sys_futex_wait)
> >  #define __NR_futex_requeue 456
> >  __SYSCALL(__NR_futex_requeue, sys_futex_requeue)
> > +#define __NR_multi_clock_gettime 457
> > +__SYSCALL(__NR_multi_clock_gettime, sys_multi_clock_gettime)
> >
> >  #undef __NR_syscalls
> > -#define __NR_syscalls 457
> > +#define __NR_syscalls 458
>
> Site note: hooking it up only here is sufficient for the
> code review but not for inclusion: once we have an agreement
> on the API, this should be added to all architectures at once.
>
> > +#define MULTI_PTP_MAX_CLOCKS 5 /* Max number of clocks */
> > +#define MULTI_PTP_MAX_SAMPLES 10 /* Max allowed offset measurement samples. */
> > +
> > +struct __ptp_multi_clock_get {
> > +     unsigned int n_clocks; /* Desired number of clocks. */
> > +     unsigned int n_samples; /* Desired number of measurements per clock. */
> > +     clockid_t clkid_arr[MULTI_PTP_MAX_CLOCKS]; /* list of clock IDs */
> > +     /*
> > +      * Array of list of n_clocks clocks time samples n_samples times.
> > +      */
> > +     struct  __kernel_timespec ts[MULTI_PTP_MAX_SAMPLES][MULTI_PTP_MAX_CLOCKS];
> > +};
>
> The fixed size arrays here seem to be an unnecessary limitation,
> both MULTI_PTP_MAX_SAMPLES and MULTI_PTP_MAX_CLOCKS are small
> enough that one can come up with scenarios where you would want
> a higher number, but at the same time the structure is already
> 808 bytes long, which is more than you'd normally want to put
> on the kernel stack, and which may take a significant time to
> copy to and from userspace.
>
> Since n_clocks and n_samples are always inputs to the syscall,
> you can just pass them as register arguments and use a dynamically
> sized array instead.
>
Both MULTI_PTP_MAX_SAMPLES and MULTI_PTP_MAX_CLOCKS are enough of any
usage we can think of,
But I think you are right, it is better to use a dynamically sized
array for future use, plus to use less stack memory.
On patch v4 a dynamically sized array will be used .
I leaving both MULTI_PTP_MAX_SAMPLES and MULTI_PTP_MAX_CLOCKS but
increasing their values, since there should be some limitation.
> It's not clear to me what you gain from having the n_samples
> argument over just calling the syscall repeatedly. Does
> this offer a benefit for accuracy or is this just meant to
> avoid syscall overhead.
It is mainly to avoid syscall overhead which also slightly improve the accuracy.
> > +SYSCALL_DEFINE1(multi_clock_gettime, struct __ptp_multi_clock_get
> > __user *, ptp_multi_clk_get)
> > +{
> > +     const struct k_clock *kc;
> > +     struct timespec64 kernel_tp;
> > +     struct __ptp_multi_clock_get multi_clk_get;
> > +     unsigned int i, j;
> > +     int error;
> > +
> > +     if (copy_from_user(&multi_clk_get, ptp_multi_clk_get,
> > sizeof(multi_clk_get)))
> > +             return -EFAULT;
>
> Here you copy the entire structure from userspace, but
> I don't actually see the .ts[] array on the stack being
> accessed later as you just copy to the user pointer
> directly.
>
you are right, will be fixed on patch v4.
> > +             for (i = 0; i < multi_clk_get.n_clocks; i++) {
> > +                     kc = clockid_to_kclock(multi_clk_get.clkid_arr[i]);
> > +                     if (!kc)
> > +                             return -EINVAL;
> > +                     error = kc->clock_get_timespec(multi_clk_get.clkid_arr[i],
> > &kernel_tp);
> > +                     if (!error && put_timespec64(&kernel_tp, (struct __kernel_timespec
> > __user *)
> > +                                                  &ptp_multi_clk_get->ts[j][i]))
> > +                             error = -EFAULT;
> > +             }
>
> The put_timespec64() and possibly the clockid_to_kclock() have
> some overhead that may introduce jitter, so it may be better to
> pull that out of the loop and have a fixed-size array
> of timespec64 values on the stack and then copy them
> at the end.
>
This overhead may introduce marginal jitter in my opinion,
still I like your idea since it is better to remove any overhead.
This will be fixed in patch v4.
I don't intend to use "a fixed-size array of timespec64 values on the
stack" since it will cause stack memory overflow,
Instead I will use a dynamically sized array (according to your
previews advice).
> On the other hand, this will still give less accuracy than the
> getcrosststamp() callback with ioctl(PTP_SYS_OFFSET_PRECISE),
> so either the last bit of accuracy isn't all that important,
> or you need to refine the interface to actually be an
> improvement over the chardev.
>
I don't understand this comment, please explain.
The ioctl(PTP_SYS_OFFSET_PRECISE) is one specific case that can be
done by multi_clock_gettime syscall (which cover many more cases)
Plus the ioctl(PTP_SYS_OFFSET_PRECISE) works only on drivers that
support this feature.
>       Arnd