Re: [PATCH v2] fuse: Abort connection if FUSE server get stuck

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 11, 2024 at 11:45 AM <etmartin4313@xxxxxxxxx> wrote:
>
> From: Etienne Martineau <etmartin4313@xxxxxxxxx>
>
> This patch abort connection if HUNG_TASK_PANIC is set and a FUSE server
> is getting stuck for too long. A slow FUSE server may tripped the
> hang check timer for legitimate reasons hence consider disabling
> HUNG_TASK_PANIC in that scenario.
>
> Without this patch, an unresponsive / buggy / malicious FUSE server can
> leave the clients in D state for a long period of time and on system where
> HUNG_TASK_PANIC is set, trigger a catastrophic reload.
>
> So, if HUNG_TASK_PANIC checking is enabled, we should wake up periodically
> to abort connections that exceed the timeout value which is define to be
> half the HUNG_TASK_TIMEOUT period, which keeps overhead low. The timer
> is per connection and runs only if there are active FUSE request pending.

Hi Etienne,

For your use case, does the generic request timeouts logic and
max_request_timeout systemctl implemented in [1] and [2] not suffice?
IMO I don't think we should have logic specifically checking for hung
task timeouts in fuse, if the generic solution can be used.

Thanks,
Joanne

[1] https://lore.kernel.org/linux-fsdevel/20241114191332.669127-1-joannelkoong@xxxxxxxxx/
[2] https://lore.kernel.org/linux-fsdevel/20241114191332.669127-4-joannelkoong@xxxxxxxxx/

>
> A FUSE client can get into D state as such ( see below scenario #1 / #2 )
>  1) request_wait_answer() -> wait_event() is UNINTERRUPTIBLE
>     OR
>  2) request_wait_answer() -> wait_event_(interruptible / killable) is head
>     of line blocking for subsequent clients accessing the same file
>
>         scenario #1:
>         2716 pts/2    D+     0:00 cat
>         $ cat /proc/2716/stack
>                 [<0>] request_wait_answer+0x22e/0x340
>                 [<0>] __fuse_simple_request+0xd8/0x2c0
>                 [<0>] fuse_perform_write+0x3ec/0x760
>                 [<0>] fuse_file_write_iter+0x3d5/0x3f0
>                 [<0>] vfs_write+0x313/0x430
>                 [<0>] ksys_write+0x6a/0xf0
>                 [<0>] __x64_sys_write+0x19/0x30
>                 [<0>] x64_sys_call+0x18ab/0x26f0
>                 [<0>] do_syscall_64+0x7c/0x170
>                 [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
>         scenario #2:
>         2962 pts/2    S+     0:00 cat
>         2963 pts/3    D+     0:00 cat
>         $ cat /proc/2962/stack
>                 [<0>] request_wait_answer+0x140/0x340
>                 [<0>] __fuse_simple_request+0xd8/0x2c0
>                 [<0>] fuse_perform_write+0x3ec/0x760
>                 [<0>] fuse_file_write_iter+0x3d5/0x3f0
>                 [<0>] vfs_write+0x313/0x430
>                 [<0>] ksys_write+0x6a/0xf0
>                 [<0>] __x64_sys_write+0x19/0x30
>                 [<0>] x64_sys_call+0x18ab/0x26f0
>                 [<0>] do_syscall_64+0x7c/0x170
>                 [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>         $ cat /proc/2963/stack
>                 [<0>] fuse_file_write_iter+0x252/0x3f0
>                 [<0>] vfs_write+0x313/0x430
>                 [<0>] ksys_write+0x6a/0xf0
>                 [<0>] __x64_sys_write+0x19/0x30
>                 [<0>] x64_sys_call+0x18ab/0x26f0
>                 [<0>] do_syscall_64+0x7c/0x170
>                 [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> Please note that this patch doesn't prevent the HUNG_TASK_WARNING.
>
> Signed-off-by: Etienne Martineau <etmartin4313@xxxxxxxxx>
> ---
>  fs/fuse/dev.c                | 100 +++++++++++++++++++++++++++++++++++
>  fs/fuse/fuse_i.h             |   8 +++
>  fs/fuse/inode.c              |   3 ++
>  include/linux/sched/sysctl.h |   8 ++-
>  kernel/hung_task.c           |   3 +-
>  5 files changed, 119 insertions(+), 3 deletions(-)
>
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 27ccae63495d..73d19de14e51 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -21,6 +21,8 @@
>  #include <linux/swap.h>
>  #include <linux/splice.h>
>  #include <linux/sched.h>
> +#include <linux/completion.h>
> +#include <linux/sched/sysctl.h>
>
>  #define CREATE_TRACE_POINTS
>  #include "fuse_trace.h"
> @@ -45,14 +47,104 @@ static struct fuse_dev *fuse_get_dev(struct file *file)
>         return READ_ONCE(file->private_data);
>  }
>
> +static bool request_expired(struct fuse_conn *fc, struct fuse_req *req,
> +               int timeout)
> +{
> +       return time_after(jiffies, req->create_time + timeout);
> +}
> +
> +/*
> + * Prevent hung task timer from firing at us
> + * Check if any requests aren't being completed by the specified request
> + * timeout. To do so, we:
> + * - check the fiq pending list
> + * - check the bg queue
> + * - check the fpq io and processing lists
> + *
> + * To make this fast, we only check against the head request on each list since
> + * these are generally queued in order of creation time (eg newer requests get
> + * queued to the tail). We might miss a few edge cases (eg requests transitioning
> + * between lists, re-sent requests at the head of the pending list having a
> + * later creation time than other requests on that list, etc.) but that is fine
> + * since if the request never gets fulfilled, it will eventually be caught.
> + */
> +void fuse_check_timeout(struct work_struct *wk)
> +{
> +       unsigned long hang_check_timer = sysctl_hung_task_timeout_secs * (HZ / 2);
> +       struct fuse_conn *fc = container_of(wk, struct fuse_conn, work.work);
> +       struct fuse_iqueue *fiq = &fc->iq;
> +       struct fuse_req *req;
> +       struct fuse_dev *fud;
> +       struct fuse_pqueue *fpq;
> +       bool expired = false;
> +       int i;
> +
> +       spin_lock(&fiq->lock);
> +       req = list_first_entry_or_null(&fiq->pending, struct fuse_req, list);
> +       if (req)
> +               expired = request_expired(fc, req, hang_check_timer);
> +       spin_unlock(&fiq->lock);
> +       if (expired)
> +               goto abort_conn;
> +
> +       spin_lock(&fc->bg_lock);
> +       req = list_first_entry_or_null(&fc->bg_queue, struct fuse_req, list);
> +       if (req)
> +               expired = request_expired(fc, req, hang_check_timer);
> +       spin_unlock(&fc->bg_lock);
> +       if (expired)
> +               goto abort_conn;
> +
> +       spin_lock(&fc->lock);
> +       if (!fc->connected) {
> +               spin_unlock(&fc->lock);
> +               return;
> +       }
> +       list_for_each_entry(fud, &fc->devices, entry) {
> +               fpq = &fud->pq;
> +               spin_lock(&fpq->lock);
> +               req = list_first_entry_or_null(&fpq->io, struct fuse_req, list);
> +               if (req && request_expired(fc, req, hang_check_timer))
> +                       goto fpq_abort;
> +
> +               for (i = 0; i < FUSE_PQ_HASH_SIZE; i++) {
> +                       req = list_first_entry_or_null(&fpq->processing[i], struct fuse_req, list);
> +                       if (req && request_expired(fc, req, hang_check_timer))
> +                               goto fpq_abort;
> +               }
> +               spin_unlock(&fpq->lock);
> +       }
> +       /* Keep the ball rolling */
> +       if (atomic_read(&fc->num_waiting) != 0)
> +               queue_delayed_work(system_wq, &fc->work, hang_check_timer);
> +       spin_unlock(&fc->lock);
> +       return;
> +
> +fpq_abort:
> +       spin_unlock(&fpq->lock);
> +       spin_unlock(&fc->lock);
> +abort_conn:
> +       fuse_abort_conn(fc);
> +}
> +
>  static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
>  {
> +       struct fuse_conn *fc = fm->fc;
>         INIT_LIST_HEAD(&req->list);
>         INIT_LIST_HEAD(&req->intr_entry);
>         init_waitqueue_head(&req->waitq);
>         refcount_set(&req->count, 1);
>         __set_bit(FR_PENDING, &req->flags);
>         req->fm = fm;
> +       req->create_time = jiffies;
> +
> +       if (sysctl_hung_task_panic) {
> +               spin_lock(&fc->lock);
> +               /* Get the ball rolling */
> +               queue_delayed_work(system_wq, &fc->work,
> +                               sysctl_hung_task_timeout_secs * (HZ / 2));
> +               spin_unlock(&fc->lock);
> +       }
>  }
>
>  static struct fuse_req *fuse_request_alloc(struct fuse_mount *fm, gfp_t flags)
> @@ -200,6 +292,14 @@ static void fuse_put_request(struct fuse_req *req)
>                         fuse_drop_waiting(fc);
>                 }
>
> +               if (sysctl_hung_task_panic) {
> +                       spin_lock(&fc->lock);
> +                       /* Stop the timeout check if we are the last request */
> +                       if (atomic_read(&fc->num_waiting) == 0)
> +                               cancel_delayed_work_sync(&fc->work);
> +                       spin_unlock(&fc->lock);
> +               }
> +
>                 fuse_request_free(req);
>         }
>  }
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 74744c6f2860..aba3ffd0fb67 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -438,6 +438,9 @@ struct fuse_req {
>
>         /** fuse_mount this request belongs to */
>         struct fuse_mount *fm;
> +
> +       /** When (in jiffies) the request was created */
> +       unsigned long create_time;
>  };
>
>  struct fuse_iqueue;
> @@ -923,6 +926,9 @@ struct fuse_conn {
>         /** IDR for backing files ids */
>         struct idr backing_files_map;
>  #endif
> +
> +       /** Request wait timeout check */
> +       struct delayed_work work;
>  };
>
>  /*
> @@ -1190,6 +1196,8 @@ void fuse_request_end(struct fuse_req *req);
>  /* Abort all requests */
>  void fuse_abort_conn(struct fuse_conn *fc);
>  void fuse_wait_aborted(struct fuse_conn *fc);
> +/* Connection timeout */
> +void fuse_check_timeout(struct work_struct *wk);
>
>  /**
>   * Invalidate inode attributes
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index 3ce4f4e81d09..ed96154df0fd 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -23,6 +23,7 @@
>  #include <linux/exportfs.h>
>  #include <linux/posix_acl.h>
>  #include <linux/pid_namespace.h>
> +#include <linux/completion.h>
>  #include <uapi/linux/magic.h>
>
>  MODULE_AUTHOR("Miklos Szeredi <miklos@xxxxxxxxxx>");
> @@ -964,6 +965,7 @@ void fuse_conn_init(struct fuse_conn *fc, struct fuse_mount *fm,
>         INIT_LIST_HEAD(&fc->entry);
>         INIT_LIST_HEAD(&fc->devices);
>         atomic_set(&fc->num_waiting, 0);
> +       INIT_DELAYED_WORK(&fc->work, fuse_check_timeout);
>         fc->max_background = FUSE_DEFAULT_MAX_BACKGROUND;
>         fc->congestion_threshold = FUSE_DEFAULT_CONGESTION_THRESHOLD;
>         atomic64_set(&fc->khctr, 0);
> @@ -1015,6 +1017,7 @@ void fuse_conn_put(struct fuse_conn *fc)
>                 if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH))
>                         fuse_backing_files_free(fc);
>                 call_rcu(&fc->rcu, delayed_release);
> +               cancel_delayed_work_sync(&fc->work);
>         }
>  }
>  EXPORT_SYMBOL_GPL(fuse_conn_put);
> diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
> index 5a64582b086b..1ed3a511060d 100644
> --- a/include/linux/sched/sysctl.h
> +++ b/include/linux/sched/sysctl.h
> @@ -5,11 +5,15 @@
>  #include <linux/types.h>
>
>  #ifdef CONFIG_DETECT_HUNG_TASK
> -/* used for hung_task and block/ */
> +/* used for hung_task, block/ and fuse */
>  extern unsigned long sysctl_hung_task_timeout_secs;
> +extern unsigned int sysctl_hung_task_panic;
>  #else
>  /* Avoid need for ifdefs elsewhere in the code */
> -enum { sysctl_hung_task_timeout_secs = 0 };
> +enum {
> +       sysctl_hung_task_timeout_secs = 0,
> +       sysctl_hung_task_panic = 0,
> +};
>  #endif
>
>  enum sched_tunable_scaling {
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index c18717189f32..16602d3754b1 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -78,8 +78,9 @@ static unsigned int __read_mostly sysctl_hung_task_all_cpu_backtrace;
>   * Should we panic (and reboot, if panic_timeout= is set) when a
>   * hung task is detected:
>   */
> -static unsigned int __read_mostly sysctl_hung_task_panic =
> +unsigned int __read_mostly sysctl_hung_task_panic =
>         IS_ENABLED(CONFIG_BOOTPARAM_HUNG_TASK_PANIC);
> +EXPORT_SYMBOL_GPL(sysctl_hung_task_panic);
>
>  static int
>  hung_task_panic(struct notifier_block *this, unsigned long event, void *ptr)
> --
> 2.34.1
>





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux