Re: [RFC PATCH urcu on mips, parisc] Fix: compat_futex should work-around futex signal-restart kernel bug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



----- On Dec 16, 2015, at 5:09 PM, Mathieu Desnoyers mathieu.desnoyers@xxxxxxxxxxxx wrote:

> When testing liburcu on a 3.18 Linux kernel, 2-core MIPS (cpu model :
> Ingenic JZRISC V4.15  FPU V0.0), we notice that a blocked sys_futex
> FUTEX_WAIT returns -1, errno=ENOSYS when interrupted by a SA_RESTART
> signal handler. This spurious ENOSYS behavior causes hangs in liburcu
> 0.9.x. Running a MIPS 3.18 kernel under a QEMU emulator exhibits the
> same behavior. This might affect earlier kernels.
> 
> This issue appears to be fixed in 3.18.y stable kernels and 3.19, but
> nevertheless, we should try to handle this kernel bug more gracefully
> than a user-space hang due to unexpected spurious ENOSYS return value.

It's actually fixed in 3.19, but not in 3.18.y stable kernels. The
Linux kernel upstream fix commit is:
e967ef02 "MIPS: Fix restart of indirect syscalls"

I've created a small test program that could also be used on parisc
to check if it suffers from the same issue (see attached).

On bogus mips kernels, we see the following output:
[OK] Test program with pid: 5748 SIGUSR1 handler
[FAIL] futex returns -1, Function not implemented

Let me know if someone can try it out on a parisc kernel.

Thanks!

Mathieu

> 
> Therefore, fallback on the "async-safe" version of compat_futex in those
> situations where FUTEX_WAIT returns ENOSYS. This async-safe fallback has
> the nice property of being OK to use concurrently with other FUTEX_WAKE
> and FUTEX_WAIT futex() calls, because it's simply a busy-wait scheme.
> 
> We suspect that parisc might be affected by a similar issue (Debian
> build bots reported a similar hang on both mips and parisc), but we do
> not have access to the hardware required to test this hypothesis.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
> CC: Michael Jeanson <mjeanson@xxxxxxxxxxxx>
> CC: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> CC: Ralf Baechle <ralf@xxxxxxxxxxxxxx>
> CC: linux-mips@xxxxxxxxxxxxxx
> CC: linux-kernel@xxxxxxxxxxxxxxx
> CC: "James E.J. Bottomley" <jejb@xxxxxxxxxxxxxxxx>
> CC: Helge Deller <deller@xxxxxx>
> CC: linux-parisc@xxxxxxxxxxxxxxx
> ---
> compat_futex.c |  2 ++
> urcu/futex.h   | 12 +++++++++++-
> 2 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/compat_futex.c b/compat_futex.c
> index b7f78f0..9e918fe 100644
> --- a/compat_futex.c
> +++ b/compat_futex.c
> @@ -111,6 +111,8 @@ end:
>  * _ASYNC SIGNAL-SAFE_.
>  * For now, timeout, uaddr2 and val3 are unused.
>  * Waiter will busy-loop trying to read the condition.
> + * It is OK to use compat_futex_async() on a futex address on which
> + * futex() WAKE operations are also performed.
>  */
> 
> int compat_futex_async(int32_t *uaddr, int op, int32_t val,
> diff --git a/urcu/futex.h b/urcu/futex.h
> index 4d16cfa..a17eda8 100644
> --- a/urcu/futex.h
> +++ b/urcu/futex.h
> @@ -73,7 +73,17 @@ static inline int futex_noasync(int32_t *uaddr, int op,
> int32_t val,
> 
> 	ret = futex(uaddr, op, val, timeout, uaddr2, val3);
> 	if (caa_unlikely(ret < 0 && errno == ENOSYS)) {
> -		return compat_futex_noasync(uaddr, op, val, timeout,
> +		/*
> +		 * The fallback on ENOSYS is the async-safe version of
> +		 * the compat futex implementation, because the
> +		 * async-safe compat implementation allows being used
> +		 * concurrently with calls to futex(). Indeed, sys_futex
> +		 * FUTEX_WAIT, on some architectures (e.g. mips), within
> +		 * a given process, spuriously return ENOSYS due to
> +		 * signal restart bugs on some kernel versions (e.g.
> +		 * Linux kernel 3.18 and possibly earlier).
> +		 */
> +		return compat_futex_async(uaddr, op, val, timeout,
> 				uaddr2, val3);
> 	}
> 	return ret;
> --
> 2.1.4

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
#define _GNU_SOURCE
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#include <signal.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/syscall.h>

static int value = -1;

#define FUTEX_WAIT		0
#define FUTEX_WAKE		1

static int futex(int32_t *uaddr, int op, int32_t val,
		const struct timespec *timeout, int32_t *uaddr2, int32_t val3)
{
	return syscall(__NR_futex, uaddr, op, val, timeout,
			uaddr2, val3);
}

static void sighandler(int signo, siginfo_t *siginfo, void *context)
{
	fprintf(stderr, "[OK] Test program with pid: %d SIGUSR1 handler\n",
		getpid());
}

int main(int argc, char **argv)
{
	struct sigaction act;
	pid_t pid, wait_pid;
	int ret;

	fprintf(stderr, "Testing futex sigrestart. Stop with CTRL-c.\n",
		getpid());
	act.sa_sigaction = sighandler;
	act.sa_flags = SA_SIGINFO | SA_RESTART;
	//act.sa_flags = SA_SIGINFO;
	sigemptyset(&act.sa_mask);
	ret = sigaction(SIGUSR1, &act, NULL);
	if (ret)
		abort();

	pid = fork();
	if (pid > 0) {
		/* parent */
		for (;;) {
			ret = kill(pid, SIGUSR1);
			if (ret) {
				perror("kill");
				abort();
			}
			sleep(1);
		}
	} else {
		if (pid < 0) {
			abort();
		}
		/* child */
		for (;;) {
			ret = futex(&value, FUTEX_WAIT, -1, NULL, NULL, 0);
			if (ret < 0) {
				fprintf(stderr, "[FAIL] futex returns %d, %s\n",
					ret, strerror(errno));
			} else {
				fprintf(stderr, "[FAIL] futex returns %d (unexpected)\n",
					ret);
			}
		}
	}

	return 0;
}

[Index of Archives]     [Linux MIPS Home]     [LKML Archive]     [Linux ARM Kernel]     [Linux ARM]     [Linux]     [Git]     [Yosemite News]     [Linux SCSI]     [Linux Hams]

  Powered by Linux