----- On Dec 16, 2015, at 5:09 PM, Mathieu Desnoyers mathieu.desnoyers@xxxxxxxxxxxx wrote: > When testing liburcu on a 3.18 Linux kernel, 2-core MIPS (cpu model : > Ingenic JZRISC V4.15 FPU V0.0), we notice that a blocked sys_futex > FUTEX_WAIT returns -1, errno=ENOSYS when interrupted by a SA_RESTART > signal handler. This spurious ENOSYS behavior causes hangs in liburcu > 0.9.x. Running a MIPS 3.18 kernel under a QEMU emulator exhibits the > same behavior. This might affect earlier kernels. > > This issue appears to be fixed in 3.18.y stable kernels and 3.19, but > nevertheless, we should try to handle this kernel bug more gracefully > than a user-space hang due to unexpected spurious ENOSYS return value. It's actually fixed in 3.19, but not in 3.18.y stable kernels. The Linux kernel upstream fix commit is: e967ef02 "MIPS: Fix restart of indirect syscalls" I've created a small test program that could also be used on parisc to check if it suffers from the same issue (see attached). On bogus mips kernels, we see the following output: [OK] Test program with pid: 5748 SIGUSR1 handler [FAIL] futex returns -1, Function not implemented Let me know if someone can try it out on a parisc kernel. Thanks! Mathieu > > Therefore, fallback on the "async-safe" version of compat_futex in those > situations where FUTEX_WAIT returns ENOSYS. This async-safe fallback has > the nice property of being OK to use concurrently with other FUTEX_WAKE > and FUTEX_WAIT futex() calls, because it's simply a busy-wait scheme. > > We suspect that parisc might be affected by a similar issue (Debian > build bots reported a similar hang on both mips and parisc), but we do > not have access to the hardware required to test this hypothesis. > > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> > CC: Michael Jeanson <mjeanson@xxxxxxxxxxxx> > CC: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> > CC: Ralf Baechle <ralf@xxxxxxxxxxxxxx> > CC: linux-mips@xxxxxxxxxxxxxx > CC: linux-kernel@xxxxxxxxxxxxxxx > CC: "James E.J. Bottomley" <jejb@xxxxxxxxxxxxxxxx> > CC: Helge Deller <deller@xxxxxx> > CC: linux-parisc@xxxxxxxxxxxxxxx > --- > compat_futex.c | 2 ++ > urcu/futex.h | 12 +++++++++++- > 2 files changed, 13 insertions(+), 1 deletion(-) > > diff --git a/compat_futex.c b/compat_futex.c > index b7f78f0..9e918fe 100644 > --- a/compat_futex.c > +++ b/compat_futex.c > @@ -111,6 +111,8 @@ end: > * _ASYNC SIGNAL-SAFE_. > * For now, timeout, uaddr2 and val3 are unused. > * Waiter will busy-loop trying to read the condition. > + * It is OK to use compat_futex_async() on a futex address on which > + * futex() WAKE operations are also performed. > */ > > int compat_futex_async(int32_t *uaddr, int op, int32_t val, > diff --git a/urcu/futex.h b/urcu/futex.h > index 4d16cfa..a17eda8 100644 > --- a/urcu/futex.h > +++ b/urcu/futex.h > @@ -73,7 +73,17 @@ static inline int futex_noasync(int32_t *uaddr, int op, > int32_t val, > > ret = futex(uaddr, op, val, timeout, uaddr2, val3); > if (caa_unlikely(ret < 0 && errno == ENOSYS)) { > - return compat_futex_noasync(uaddr, op, val, timeout, > + /* > + * The fallback on ENOSYS is the async-safe version of > + * the compat futex implementation, because the > + * async-safe compat implementation allows being used > + * concurrently with calls to futex(). Indeed, sys_futex > + * FUTEX_WAIT, on some architectures (e.g. mips), within > + * a given process, spuriously return ENOSYS due to > + * signal restart bugs on some kernel versions (e.g. > + * Linux kernel 3.18 and possibly earlier). > + */ > + return compat_futex_async(uaddr, op, val, timeout, > uaddr2, val3); > } > return ret; > -- > 2.1.4 -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
#define _GNU_SOURCE #include <stdio.h> #include <sys/types.h> #include <unistd.h> #include <signal.h> #include <stdlib.h> #include <errno.h> #include <sys/syscall.h> static int value = -1; #define FUTEX_WAIT 0 #define FUTEX_WAKE 1 static int futex(int32_t *uaddr, int op, int32_t val, const struct timespec *timeout, int32_t *uaddr2, int32_t val3) { return syscall(__NR_futex, uaddr, op, val, timeout, uaddr2, val3); } static void sighandler(int signo, siginfo_t *siginfo, void *context) { fprintf(stderr, "[OK] Test program with pid: %d SIGUSR1 handler\n", getpid()); } int main(int argc, char **argv) { struct sigaction act; pid_t pid, wait_pid; int ret; fprintf(stderr, "Testing futex sigrestart. Stop with CTRL-c.\n", getpid()); act.sa_sigaction = sighandler; act.sa_flags = SA_SIGINFO | SA_RESTART; //act.sa_flags = SA_SIGINFO; sigemptyset(&act.sa_mask); ret = sigaction(SIGUSR1, &act, NULL); if (ret) abort(); pid = fork(); if (pid > 0) { /* parent */ for (;;) { ret = kill(pid, SIGUSR1); if (ret) { perror("kill"); abort(); } sleep(1); } } else { if (pid < 0) { abort(); } /* child */ for (;;) { ret = futex(&value, FUTEX_WAIT, -1, NULL, NULL, 0); if (ret < 0) { fprintf(stderr, "[FAIL] futex returns %d, %s\n", ret, strerror(errno)); } else { fprintf(stderr, "[FAIL] futex returns %d (unexpected)\n", ret); } } } return 0; }