Hi Joanne, On (24/11/14 11:13), Joanne Koong wrote: > There are situations where fuse servers can become unresponsive or > stuck, for example if the server is deadlocked. Currently, there's no > good way to detect if a server is stuck and needs to be killed manually. > > This commit adds an option for enforcing a timeout (in minutes) for > requests where if the timeout elapses without the server responding to > the request, the connection will be automatically aborted. Does it make sense to configure timeout in seconds? hung-task watchdog operates in seconds and can be set to anything, e.g. 45 seconds, so it panic the system before fuse timeout has a chance to trigger. Another question is: this will terminate the connection. Does it make sense to run timeout per request and just "abort" individual requests? What I'm currently playing with here on our side is something like this: ---- diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 8573d79ef29c..82e071cecafd 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -21,6 +21,7 @@ #include <linux/swap.h> #include <linux/splice.h> #include <linux/sched.h> +#include <linux/sched/sysctl.h> MODULE_ALIAS_MISCDEV(FUSE_MINOR); MODULE_ALIAS("devname:fuse"); @@ -368,11 +369,24 @@ static void request_wait_answer(struct fuse_req *req) int err; if (!fc->no_interrupt) { - /* Any signal may interrupt this */ - err = wait_event_interruptible(req->waitq, + /* We can use CONFIG_DEFAULT_HUNG_TASK_TIMEOUT here */ + unsigned long hang_check = sysctl_hung_task_timeout_secs; + + if (hang_check) { + /* Any signal or timeout may interrupt this */ + err = wait_event_interruptible_timeout(req->waitq, + test_bit(FR_FINISHED, &req->flags), + hang_check * (HZ / 2)); + if (err > 0) + return; + } else { + /* Any signal may interrupt this */ + err = wait_event_interruptible(req->waitq, test_bit(FR_FINISHED, &req->flags)); - if (!err) - return; + + if (!err) + return; + } set_bit(FR_INTERRUPTED, &req->flags); /* matches barrier in fuse_dev_do_read() */