On Tue, Oct 26, 2021 at 2:34 AM André Almeida <andrealmeid@xxxxxxxxxxxxx> wrote: > > Hi Alistair, > > Às 02:54 de 21/10/21, Alistair Francis escreveu: > > From: Alistair Francis <alistair.francis@xxxxxxx> > > > > This commit adds two futex syscall wrappers that are exposed to > > userspace. > > > > Neither the kernel or glibc currently expose a futex wrapper, so > > userspace is left performing raw syscalls. This has mostly been becuase > > because > > > the overloading of one of the arguments makes it impossible to provide a > > single type safe function. > > > > Until recently the single syscall has worked fine. With the introduction > > of a 64-bit time_t futex call on 32-bit architectures, this has become > > more complex. The logic of handling the two possible futex syscalls is > > complex and often implemented incorrectly. > > > > This patch adds two futux syscall functions that correctly handle the > > time_t complexity for userspace. > > > > This idea is based on previous discussions: https://lkml.org/lkml/2021/9/21/143 > > I would use lore > https://lore.kernel.org/lkml/CAK8P3a3x_EyCiPDpMK54y=Rtm-Wb08ym2TNiuAZgXhYrThcWTw@xxxxxxxxxxxxxx/ > > > > > Signed-off-by: Alistair Francis <alistair.francis@xxxxxxx> > > Thanks for working on that :) > > > --- > > include/uapi/linux/futex_syscall.h | 81 ++++++++++++++++++++++++++++++ > > 1 file changed, 81 insertions(+) > > create mode 100644 include/uapi/linux/futex_syscall.h > > > > diff --git a/include/uapi/linux/futex_syscall.h b/include/uapi/linux/futex_syscall.h > > new file mode 100644 > > index 0000000000000..f84a0c68baf78 > > --- /dev/null > > +++ b/include/uapi/linux/futex_syscall.h > > @@ -0,0 +1,81 @@ > > +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ > > +#ifndef _UAPI_LINUX_FUTEX_SYSCALL_H > > +#define _UAPI_LINUX_FUTEX_SYSCALL_H > > + > > +#include <asm/unistd.h> > > +#include <errno.h> > > +#include <linux/types.h> > > +#include <linux/time_types.h> > > +#include <sys/syscall.h> > > + > > +/** > > + * futex_syscall_timeout() - __NR_futex/__NR_futex_time64 syscall wrapper > > + * @uaddr: address of first futex > > + * @op: futex op code > > + * @val: typically expected value of uaddr, but varies by op > > + * @timeout: an absolute struct timespec > > + * @uaddr2: address of second futex for some ops > > + * @val3: varies by op > > + */ > > +static inline int > > +__kernel_futex_syscall_timeout(volatile u_int32_t *uaddr, int op, u_int32_t val, > > + struct timespec *timeout, volatile u_int32_t *uaddr2, int val3) > > I tried to write an example[0] that uses this header, but I can't > compile given that u_int32_t isn't defined. Maybe change to uint32_t and > include <stdint.h>? > > Also, I got some invalid use of undefined type 'struct timespec', and > #include <time.h> solved. > > [0] https://paste.debian.net/1216834/ > > > +{ > > +#if defined(__NR_futex_time64) > > + if (sizeof(*timeout) != sizeof(struct __kernel_old_timespec)) { > > + int ret = syscall(__NR_futex_time64, uaddr, op, val, timeout, uaddr2, val3); > > + > > + if (ret == 0 || errno != ENOSYS) > > + return ret; > > + } > > +#endif > > + > > +#if defined(__NR_futex) > > + if (sizeof(*timeout) == sizeof(struct __kernel_old_timespec)) > > + return syscall(__NR_futex, uaddr, op, val, timeout, uaddr2, val3); > > + > > + if (timeout && timeout->tv_sec == (long)timeout->tv_sec) { > > + struct __kernel_old_timespec ts32; > > + > > + ts32.tv_sec = (__kernel_long_t) timeout->tv_sec;> + ts32.tv_nsec = (__kernel_long_t) timeout->tv_nsec; > > + > > + return syscall(__NR_futex, uaddr, op, val, &ts32, uaddr2, val3); > > + } else if (!timeout) { > > + return syscall(__NR_futex, uaddr, op, val, NULL, uaddr2, val3); > > + } > > +#endif > > If I read this part right, you will always use ts32 for __NR_futex. I > know that it can be misleading, but __NR_futex uses ts64 in 64-bit > archs, so they shouldn't be converted to ts32 in those cases. > > Just to make it clear, there's no __NR_futex_time64 at 64-bit archs. > > > + > > + errno = ENOSYS; > > + return -1; > > +} > > + > > +/** > > + * futex_syscall_nr_requeue() - __NR_futex/__NR_futex_time64 syscall wrapper > > + * @uaddr: address of first futex > > + * @op: futex op code > > + * @val: typically expected value of uaddr, but varies by op > > + * @nr_requeue: an op specific meaning > > + * @uaddr2: address of second futex for some ops > > + * @val3: varies by op > > + */ > > +static inline int > > +__kernel_futex_syscall_nr_requeue(volatile u_int32_t *uaddr, int op, u_int32_t val, > > + u_int32_t nr_requeue, volatile u_int32_t *uaddr2, int val3) > > I would always assume that op is FUTEX_CMP_REQUEUE, given that > FUTEX_REQUEUE is racy. From `man futex`: There are other ops that this could be though. From just the kernel futex self tests it could be FUTEX_WAKE_OP, FUTEX_WAIT_REQUEUE_PI or FUTEX_CMP_REQUEUE_PI Alistair > > The FUTEX_CMP_REQUEUE operation was added as a replacement for the > earlier FUTEX_REQUEUE. The difference is that the check of the value at > uaddr can be used to ensure that requeueing happens only under certain > conditions, which allows race conditions to be avoided in certain use cases. > > And then we can drop `int op` from the args and give defined > descriptions for the args. >