Re: [PATCH v2] fuse: Abort connection if FUSE server get stuck

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 11, 2024 at 3:04 PM Etienne Martineau
<etmartin4313@xxxxxxxxx> wrote:
>
> On Wed, Dec 11, 2024 at 5:04 PM Joanne Koong <joannelkoong@xxxxxxxxx> wrote:
> >
> > On Wed, Dec 11, 2024 at 11:45 AM <etmartin4313@xxxxxxxxx> wrote:
> > >
> > > From: Etienne Martineau <etmartin4313@xxxxxxxxx>
> > >
> > > This patch abort connection if HUNG_TASK_PANIC is set and a FUSE server
> > > is getting stuck for too long. A slow FUSE server may tripped the
> > > hang check timer for legitimate reasons hence consider disabling
> > > HUNG_TASK_PANIC in that scenario.
> > >
> > > Without this patch, an unresponsive / buggy / malicious FUSE server can
> > > leave the clients in D state for a long period of time and on system where
> > > HUNG_TASK_PANIC is set, trigger a catastrophic reload.
> > >
> > > So, if HUNG_TASK_PANIC checking is enabled, we should wake up periodically
> > > to abort connections that exceed the timeout value which is define to be
> > > half the HUNG_TASK_TIMEOUT period, which keeps overhead low. The timer
> > > is per connection and runs only if there are active FUSE request pending.
> >
> > Hi Etienne,
> >
> > For your use case, does the generic request timeouts logic and
> > max_request_timeout systemctl implemented in [1] and [2] not suffice?
> > IMO I don't think we should have logic specifically checking for hung
> > task timeouts in fuse, if the generic solution can be used.
> >
> > Thanks,
> > Joanne
>
> We need a way to avoid catastrophic reloads on systems where HUNG_TASK_PANIC
> is set while a buggy / malicious FUSE server stops responding.
> I would argue that this is much needed in stable branches as well...
>
> For that reason, I believe we need to keep things simple for step #1
> e.g. there is no
> need to introduce another knob as we already have HUNG_TASK_TIMEOUT which
> holds the source of truth.
>
> IMO introducing those new knobs will put an unnecessary burden on sysadmins into
> something that is error prone because unlike
>   CONFIG_DETECT_HUNG_TASK=y
>   CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=120
> which is built-in, the "default_request_timeout" /
> "max_request_timeout" needs to be
> set appropriately after every reboot and failure to do so may have
> nasty consequences.

imo, it is not important to directly defend against the hung task case
inside the fuse code itself. imo, the generic timeout should be used
instead. As I understand it, hung task panic is mostly enabled for
debug purposes and is enabled through a sysctl. imo, if the system
admin enables the hung task panic sysctl value, then it is not too
much of a burden for them to also set the fuse max request timeout
sysctl.


Thanks,
Joanne

>
> For the more generic cases then yes those timeouts make sense to me.
>
> Thanks,
> Etienne
>
> >
> > [1] https://lore.kernel.org/linux-fsdevel/20241114191332.669127-1-joannelkoong@xxxxxxxxx/
> > [2] https://lore.kernel.org/linux-fsdevel/20241114191332.669127-4-joannelkoong@xxxxxxxxx/
> >
> > >
> > > A FUSE client can get into D state as such ( see below scenario #1 / #2 )
> > >  1) request_wait_answer() -> wait_event() is UNINTERRUPTIBLE
> > >     OR
> > >  2) request_wait_answer() -> wait_event_(interruptible / killable) is head
> > >     of line blocking for subsequent clients accessing the same file
> > >
> > >         scenario #1:
> > >         2716 pts/2    D+     0:00 cat
> > >         $ cat /proc/2716/stack
> > >                 [<0>] request_wait_answer+0x22e/0x340
> > >                 [<0>] __fuse_simple_request+0xd8/0x2c0
> > >                 [<0>] fuse_perform_write+0x3ec/0x760
> > >                 [<0>] fuse_file_write_iter+0x3d5/0x3f0
> > >                 [<0>] vfs_write+0x313/0x430
> > >                 [<0>] ksys_write+0x6a/0xf0
> > >                 [<0>] __x64_sys_write+0x19/0x30
> > >                 [<0>] x64_sys_call+0x18ab/0x26f0
> > >                 [<0>] do_syscall_64+0x7c/0x170
> > >                 [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > >
> > >         scenario #2:
> > >         2962 pts/2    S+     0:00 cat
> > >         2963 pts/3    D+     0:00 cat
> > >         $ cat /proc/2962/stack
> > >                 [<0>] request_wait_answer+0x140/0x340
> > >                 [<0>] __fuse_simple_request+0xd8/0x2c0
> > >                 [<0>] fuse_perform_write+0x3ec/0x760
> > >                 [<0>] fuse_file_write_iter+0x3d5/0x3f0
> > >                 [<0>] vfs_write+0x313/0x430
> > >                 [<0>] ksys_write+0x6a/0xf0
> > >                 [<0>] __x64_sys_write+0x19/0x30
> > >                 [<0>] x64_sys_call+0x18ab/0x26f0
> > >                 [<0>] do_syscall_64+0x7c/0x170
> > >                 [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > >         $ cat /proc/2963/stack
> > >                 [<0>] fuse_file_write_iter+0x252/0x3f0
> > >                 [<0>] vfs_write+0x313/0x430
> > >                 [<0>] ksys_write+0x6a/0xf0
> > >                 [<0>] __x64_sys_write+0x19/0x30
> > >                 [<0>] x64_sys_call+0x18ab/0x26f0
> > >                 [<0>] do_syscall_64+0x7c/0x170
> > >                 [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > >
> > > Please note that this patch doesn't prevent the HUNG_TASK_WARNING.
> > >
> > > Signed-off-by: Etienne Martineau <etmartin4313@xxxxxxxxx>
> > > ---
> > >  fs/fuse/dev.c                | 100 +++++++++++++++++++++++++++++++++++
> > >  fs/fuse/fuse_i.h             |   8 +++
> > >  fs/fuse/inode.c              |   3 ++
> > >  include/linux/sched/sysctl.h |   8 ++-
> > >  kernel/hung_task.c           |   3 +-
> > >  5 files changed, 119 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> > > index 27ccae63495d..73d19de14e51 100644
> > > --- a/fs/fuse/dev.c
> > > +++ b/fs/fuse/dev.c
> > > @@ -21,6 +21,8 @@
> > >  #include <linux/swap.h>
> > >  #include <linux/splice.h>
> > >  #include <linux/sched.h>
> > > +#include <linux/completion.h>
> > > +#include <linux/sched/sysctl.h>
> > >
> > >  #define CREATE_TRACE_POINTS
> > >  #include "fuse_trace.h"
> > > @@ -45,14 +47,104 @@ static struct fuse_dev *fuse_get_dev(struct file *file)
> > >         return READ_ONCE(file->private_data);
> > >  }
> > >
> > > +static bool request_expired(struct fuse_conn *fc, struct fuse_req *req,
> > > +               int timeout)
> > > +{
> > > +       return time_after(jiffies, req->create_time + timeout);
> > > +}
> > > +
> > > +/*
> > > + * Prevent hung task timer from firing at us
> > > + * Check if any requests aren't being completed by the specified request
> > > + * timeout. To do so, we:
> > > + * - check the fiq pending list
> > > + * - check the bg queue
> > > + * - check the fpq io and processing lists
> > > + *
> > > + * To make this fast, we only check against the head request on each list since
> > > + * these are generally queued in order of creation time (eg newer requests get
> > > + * queued to the tail). We might miss a few edge cases (eg requests transitioning
> > > + * between lists, re-sent requests at the head of the pending list having a
> > > + * later creation time than other requests on that list, etc.) but that is fine
> > > + * since if the request never gets fulfilled, it will eventually be caught.
> > > + */
> > > +void fuse_check_timeout(struct work_struct *wk)
> > > +{
> > > +       unsigned long hang_check_timer = sysctl_hung_task_timeout_secs * (HZ / 2);
> > > +       struct fuse_conn *fc = container_of(wk, struct fuse_conn, work.work);
> > > +       struct fuse_iqueue *fiq = &fc->iq;
> > > +       struct fuse_req *req;
> > > +       struct fuse_dev *fud;
> > > +       struct fuse_pqueue *fpq;
> > > +       bool expired = false;
> > > +       int i;
> > > +
> > > +       spin_lock(&fiq->lock);
> > > +       req = list_first_entry_or_null(&fiq->pending, struct fuse_req, list);
> > > +       if (req)
> > > +               expired = request_expired(fc, req, hang_check_timer);
> > > +       spin_unlock(&fiq->lock);
> > > +       if (expired)
> > > +               goto abort_conn;
> > > +
> > > +       spin_lock(&fc->bg_lock);
> > > +       req = list_first_entry_or_null(&fc->bg_queue, struct fuse_req, list);
> > > +       if (req)
> > > +               expired = request_expired(fc, req, hang_check_timer);
> > > +       spin_unlock(&fc->bg_lock);
> > > +       if (expired)
> > > +               goto abort_conn;
> > > +
> > > +       spin_lock(&fc->lock);
> > > +       if (!fc->connected) {
> > > +               spin_unlock(&fc->lock);
> > > +               return;
> > > +       }
> > > +       list_for_each_entry(fud, &fc->devices, entry) {
> > > +               fpq = &fud->pq;
> > > +               spin_lock(&fpq->lock);
> > > +               req = list_first_entry_or_null(&fpq->io, struct fuse_req, list);
> > > +               if (req && request_expired(fc, req, hang_check_timer))
> > > +                       goto fpq_abort;
> > > +
> > > +               for (i = 0; i < FUSE_PQ_HASH_SIZE; i++) {
> > > +                       req = list_first_entry_or_null(&fpq->processing[i], struct fuse_req, list);
> > > +                       if (req && request_expired(fc, req, hang_check_timer))
> > > +                               goto fpq_abort;
> > > +               }
> > > +               spin_unlock(&fpq->lock);
> > > +       }
> > > +       /* Keep the ball rolling */
> > > +       if (atomic_read(&fc->num_waiting) != 0)
> > > +               queue_delayed_work(system_wq, &fc->work, hang_check_timer);
> > > +       spin_unlock(&fc->lock);
> > > +       return;
> > > +
> > > +fpq_abort:
> > > +       spin_unlock(&fpq->lock);
> > > +       spin_unlock(&fc->lock);
> > > +abort_conn:
> > > +       fuse_abort_conn(fc);
> > > +}
> > > +
> > >  static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
> > >  {
> > > +       struct fuse_conn *fc = fm->fc;
> > >         INIT_LIST_HEAD(&req->list);
> > >         INIT_LIST_HEAD(&req->intr_entry);
> > >         init_waitqueue_head(&req->waitq);
> > >         refcount_set(&req->count, 1);
> > >         __set_bit(FR_PENDING, &req->flags);
> > >         req->fm = fm;
> > > +       req->create_time = jiffies;
> > > +
> > > +       if (sysctl_hung_task_panic) {
> > > +               spin_lock(&fc->lock);
> > > +               /* Get the ball rolling */
> > > +               queue_delayed_work(system_wq, &fc->work,
> > > +                               sysctl_hung_task_timeout_secs * (HZ / 2));
> > > +               spin_unlock(&fc->lock);
> > > +       }
> > >  }
> > >
> > >  static struct fuse_req *fuse_request_alloc(struct fuse_mount *fm, gfp_t flags)
> > > @@ -200,6 +292,14 @@ static void fuse_put_request(struct fuse_req *req)
> > >                         fuse_drop_waiting(fc);
> > >                 }
> > >
> > > +               if (sysctl_hung_task_panic) {
> > > +                       spin_lock(&fc->lock);
> > > +                       /* Stop the timeout check if we are the last request */
> > > +                       if (atomic_read(&fc->num_waiting) == 0)
> > > +                               cancel_delayed_work_sync(&fc->work);
> > > +                       spin_unlock(&fc->lock);
> > > +               }
> > > +
> > >                 fuse_request_free(req);
> > >         }
> > >  }
> > > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> > > index 74744c6f2860..aba3ffd0fb67 100644
> > > --- a/fs/fuse/fuse_i.h
> > > +++ b/fs/fuse/fuse_i.h
> > > @@ -438,6 +438,9 @@ struct fuse_req {
> > >
> > >         /** fuse_mount this request belongs to */
> > >         struct fuse_mount *fm;
> > > +
> > > +       /** When (in jiffies) the request was created */
> > > +       unsigned long create_time;
> > >  };
> > >
> > >  struct fuse_iqueue;
> > > @@ -923,6 +926,9 @@ struct fuse_conn {
> > >         /** IDR for backing files ids */
> > >         struct idr backing_files_map;
> > >  #endif
> > > +
> > > +       /** Request wait timeout check */
> > > +       struct delayed_work work;
> > >  };
> > >
> > >  /*
> > > @@ -1190,6 +1196,8 @@ void fuse_request_end(struct fuse_req *req);
> > >  /* Abort all requests */
> > >  void fuse_abort_conn(struct fuse_conn *fc);
> > >  void fuse_wait_aborted(struct fuse_conn *fc);
> > > +/* Connection timeout */
> > > +void fuse_check_timeout(struct work_struct *wk);
> > >
> > >  /**
> > >   * Invalidate inode attributes
> > > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> > > index 3ce4f4e81d09..ed96154df0fd 100644
> > > --- a/fs/fuse/inode.c
> > > +++ b/fs/fuse/inode.c
> > > @@ -23,6 +23,7 @@
> > >  #include <linux/exportfs.h>
> > >  #include <linux/posix_acl.h>
> > >  #include <linux/pid_namespace.h>
> > > +#include <linux/completion.h>
> > >  #include <uapi/linux/magic.h>
> > >
> > >  MODULE_AUTHOR("Miklos Szeredi <miklos@xxxxxxxxxx>");
> > > @@ -964,6 +965,7 @@ void fuse_conn_init(struct fuse_conn *fc, struct fuse_mount *fm,
> > >         INIT_LIST_HEAD(&fc->entry);
> > >         INIT_LIST_HEAD(&fc->devices);
> > >         atomic_set(&fc->num_waiting, 0);
> > > +       INIT_DELAYED_WORK(&fc->work, fuse_check_timeout);
> > >         fc->max_background = FUSE_DEFAULT_MAX_BACKGROUND;
> > >         fc->congestion_threshold = FUSE_DEFAULT_CONGESTION_THRESHOLD;
> > >         atomic64_set(&fc->khctr, 0);
> > > @@ -1015,6 +1017,7 @@ void fuse_conn_put(struct fuse_conn *fc)
> > >                 if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH))
> > >                         fuse_backing_files_free(fc);
> > >                 call_rcu(&fc->rcu, delayed_release);
> > > +               cancel_delayed_work_sync(&fc->work);
> > >         }
> > >  }
> > >  EXPORT_SYMBOL_GPL(fuse_conn_put);
> > > diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
> > > index 5a64582b086b..1ed3a511060d 100644
> > > --- a/include/linux/sched/sysctl.h
> > > +++ b/include/linux/sched/sysctl.h
> > > @@ -5,11 +5,15 @@
> > >  #include <linux/types.h>
> > >
> > >  #ifdef CONFIG_DETECT_HUNG_TASK
> > > -/* used for hung_task and block/ */
> > > +/* used for hung_task, block/ and fuse */
> > >  extern unsigned long sysctl_hung_task_timeout_secs;
> > > +extern unsigned int sysctl_hung_task_panic;
> > >  #else
> > >  /* Avoid need for ifdefs elsewhere in the code */
> > > -enum { sysctl_hung_task_timeout_secs = 0 };
> > > +enum {
> > > +       sysctl_hung_task_timeout_secs = 0,
> > > +       sysctl_hung_task_panic = 0,
> > > +};
> > >  #endif
> > >
> > >  enum sched_tunable_scaling {
> > > diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> > > index c18717189f32..16602d3754b1 100644
> > > --- a/kernel/hung_task.c
> > > +++ b/kernel/hung_task.c
> > > @@ -78,8 +78,9 @@ static unsigned int __read_mostly sysctl_hung_task_all_cpu_backtrace;
> > >   * Should we panic (and reboot, if panic_timeout= is set) when a
> > >   * hung task is detected:
> > >   */
> > > -static unsigned int __read_mostly sysctl_hung_task_panic =
> > > +unsigned int __read_mostly sysctl_hung_task_panic =
> > >         IS_ENABLED(CONFIG_BOOTPARAM_HUNG_TASK_PANIC);
> > > +EXPORT_SYMBOL_GPL(sysctl_hung_task_panic);
> > >
> > >  static int
> > >  hung_task_panic(struct notifier_block *this, unsigned long event, void *ptr)
> > > --
> > > 2.34.1
> > >





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux