Re: [PATCH v2] fuse: Abort connection if FUSE server get stuck

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 12, 2024 at 12:27 AM Lance Yang <ioworker0@xxxxxxxxx> wrote:
>
> On Thu, Dec 12, 2024 at 7:04 AM Etienne Martineau
> <etmartin4313@xxxxxxxxx> wrote:
> >
> > On Wed, Dec 11, 2024 at 5:04 PM Joanne Koong <joannelkoong@xxxxxxxxx> wrote:
> > >
> > > On Wed, Dec 11, 2024 at 11:45 AM <etmartin4313@xxxxxxxxx> wrote:
> > > >
> > > > From: Etienne Martineau <etmartin4313@xxxxxxxxx>
> > > >
> > > > This patch abort connection if HUNG_TASK_PANIC is set and a FUSE server
> > > > is getting stuck for too long. A slow FUSE server may tripped the
>
> A FUSE server is getting stuck for longer than hung_task_timeout_secs
> (the default is two minutes). Is it not buggy in the real-world?
Can be buggy OR malicious yes. FUSE server is a user-space process so all
bets are off.

> > > > hang check timer for legitimate reasons hence consider disabling
> > > > HUNG_TASK_PANIC in that scenario.
>
> Why not just consider increasing hung_task_timeout_secs if necessary?
What value is the right value then?

The timeout is global, so by increasing the timeout it will take
longer for the kernel
to report and act on different hung task scenarios.

HUNG_TASK_PANIC is a failsafe mechanism that helps prevent your system from
running headless for too long. Say you are in crypto-mining and your
box is getting
stuck on some FPGA drivers -- the process calling into that driver is
stuck in an
UNINTERRUPTIBLE wait somehow. Without HUNG_TASK_PANIC you'll lose
money because your box may end up sitting idling for hours doing nothing.

>
> > > >
> > > > Without this patch, an unresponsive / buggy / malicious FUSE server can
> > > > leave the clients in D state for a long period of time and on system where
> > > > HUNG_TASK_PANIC is set, trigger a catastrophic reload.
>
> Sorry, I don't see any sense in knowing whether HUNG_TASK_PANIC is enabled for
> FUSE servers. Or am I possibly missing something important?
Regular file-system drivers handles everything internally but FUSE on
the other hands,
delegate the file system operation to a user process ( FUSE server )
If the FUSE server is turning bad, you don't want to reload right?

A non-privileged user can  potentially exploit this flaw and trigger a
reload. I'm
surprised that this didn't get flagged before ( maybe I'm missing something ? )
IMO this is why I think something needs to be done for the stable
branch as well.

>
> If HUNG_TASK_PANIC is set, we should do a reload when a hung task is detected;
> this is working as expected IHMO.
Say when your browser hangs on your system, do you reload? FUSE server
is just another
process.

thanks
Etienne

> Thanks,
> Lance
>
> > > >
> > > > So, if HUNG_TASK_PANIC checking is enabled, we should wake up periodically
> > > > to abort connections that exceed the timeout value which is define to be
> > > > half the HUNG_TASK_TIMEOUT period, which keeps overhead low. The timer
> > > > is per connection and runs only if there are active FUSE request pending.
> > >
> > > Hi Etienne,
> > >
> > > For your use case, does the generic request timeouts logic and
> > > max_request_timeout systemctl implemented in [1] and [2] not suffice?
> > > IMO I don't think we should have logic specifically checking for hung
> > > task timeouts in fuse, if the generic solution can be used.
> > >
> > > Thanks,
> > > Joanne
> >
> > We need a way to avoid catastrophic reloads on systems where HUNG_TASK_PANIC
> > is set while a buggy / malicious FUSE server stops responding.
> > I would argue that this is much needed in stable branches as well...
> >
> > For that reason, I believe we need to keep things simple for step #1
> > e.g. there is no
> > need to introduce another knob as we already have HUNG_TASK_TIMEOUT which
> > holds the source of truth.
> >
> > IMO introducing those new knobs will put an unnecessary burden on sysadmins into
> > something that is error prone because unlike
> >   CONFIG_DETECT_HUNG_TASK=y
> >   CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=120
> > which is built-in, the "default_request_timeout" /
> > "max_request_timeout" needs to be
> > set appropriately after every reboot and failure to do so may have
> > nasty consequences.
> >
> > For the more generic cases then yes those timeouts make sense to me.
> >
> > Thanks,
> > Etienne
> >
> > >
> > > [1] https://lore.kernel.org/linux-fsdevel/20241114191332.669127-1-joannelkoong@xxxxxxxxx/
> > > [2] https://lore.kernel.org/linux-fsdevel/20241114191332.669127-4-joannelkoong@xxxxxxxxx/
> > >
> > > >
> > > > A FUSE client can get into D state as such ( see below scenario #1 / #2 )
> > > >  1) request_wait_answer() -> wait_event() is UNINTERRUPTIBLE
> > > >     OR
> > > >  2) request_wait_answer() -> wait_event_(interruptible / killable) is head
> > > >     of line blocking for subsequent clients accessing the same file
> > > >
> > > >         scenario #1:
> > > >         2716 pts/2    D+     0:00 cat
> > > >         $ cat /proc/2716/stack
> > > >                 [<0>] request_wait_answer+0x22e/0x340
> > > >                 [<0>] __fuse_simple_request+0xd8/0x2c0
> > > >                 [<0>] fuse_perform_write+0x3ec/0x760
> > > >                 [<0>] fuse_file_write_iter+0x3d5/0x3f0
> > > >                 [<0>] vfs_write+0x313/0x430
> > > >                 [<0>] ksys_write+0x6a/0xf0
> > > >                 [<0>] __x64_sys_write+0x19/0x30
> > > >                 [<0>] x64_sys_call+0x18ab/0x26f0
> > > >                 [<0>] do_syscall_64+0x7c/0x170
> > > >                 [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > >
> > > >         scenario #2:
> > > >         2962 pts/2    S+     0:00 cat
> > > >         2963 pts/3    D+     0:00 cat
> > > >         $ cat /proc/2962/stack
> > > >                 [<0>] request_wait_answer+0x140/0x340
> > > >                 [<0>] __fuse_simple_request+0xd8/0x2c0
> > > >                 [<0>] fuse_perform_write+0x3ec/0x760
> > > >                 [<0>] fuse_file_write_iter+0x3d5/0x3f0
> > > >                 [<0>] vfs_write+0x313/0x430
> > > >                 [<0>] ksys_write+0x6a/0xf0
> > > >                 [<0>] __x64_sys_write+0x19/0x30
> > > >                 [<0>] x64_sys_call+0x18ab/0x26f0
> > > >                 [<0>] do_syscall_64+0x7c/0x170
> > > >                 [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > >         $ cat /proc/2963/stack
> > > >                 [<0>] fuse_file_write_iter+0x252/0x3f0
> > > >                 [<0>] vfs_write+0x313/0x430
> > > >                 [<0>] ksys_write+0x6a/0xf0
> > > >                 [<0>] __x64_sys_write+0x19/0x30
> > > >                 [<0>] x64_sys_call+0x18ab/0x26f0
> > > >                 [<0>] do_syscall_64+0x7c/0x170
> > > >                 [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > >
> > > > Please note that this patch doesn't prevent the HUNG_TASK_WARNING.
> > > >
> > > > Signed-off-by: Etienne Martineau <etmartin4313@xxxxxxxxx>
> > > > ---
> > > >  fs/fuse/dev.c                | 100 +++++++++++++++++++++++++++++++++++
> > > >  fs/fuse/fuse_i.h             |   8 +++
> > > >  fs/fuse/inode.c              |   3 ++
> > > >  include/linux/sched/sysctl.h |   8 ++-
> > > >  kernel/hung_task.c           |   3 +-
> > > >  5 files changed, 119 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> > > > index 27ccae63495d..73d19de14e51 100644
> > > > --- a/fs/fuse/dev.c
> > > > +++ b/fs/fuse/dev.c
> > > > @@ -21,6 +21,8 @@
> > > >  #include <linux/swap.h>
> > > >  #include <linux/splice.h>
> > > >  #include <linux/sched.h>
> > > > +#include <linux/completion.h>
> > > > +#include <linux/sched/sysctl.h>
> > > >
> > > >  #define CREATE_TRACE_POINTS
> > > >  #include "fuse_trace.h"
> > > > @@ -45,14 +47,104 @@ static struct fuse_dev *fuse_get_dev(struct file *file)
> > > >         return READ_ONCE(file->private_data);
> > > >  }
> > > >
> > > > +static bool request_expired(struct fuse_conn *fc, struct fuse_req *req,
> > > > +               int timeout)
> > > > +{
> > > > +       return time_after(jiffies, req->create_time + timeout);
> > > > +}
> > > > +
> > > > +/*
> > > > + * Prevent hung task timer from firing at us
> > > > + * Check if any requests aren't being completed by the specified request
> > > > + * timeout. To do so, we:
> > > > + * - check the fiq pending list
> > > > + * - check the bg queue
> > > > + * - check the fpq io and processing lists
> > > > + *
> > > > + * To make this fast, we only check against the head request on each list since
> > > > + * these are generally queued in order of creation time (eg newer requests get
> > > > + * queued to the tail). We might miss a few edge cases (eg requests transitioning
> > > > + * between lists, re-sent requests at the head of the pending list having a
> > > > + * later creation time than other requests on that list, etc.) but that is fine
> > > > + * since if the request never gets fulfilled, it will eventually be caught.
> > > > + */
> > > > +void fuse_check_timeout(struct work_struct *wk)
> > > > +{
> > > > +       unsigned long hang_check_timer = sysctl_hung_task_timeout_secs * (HZ / 2);
> > > > +       struct fuse_conn *fc = container_of(wk, struct fuse_conn, work.work);
> > > > +       struct fuse_iqueue *fiq = &fc->iq;
> > > > +       struct fuse_req *req;
> > > > +       struct fuse_dev *fud;
> > > > +       struct fuse_pqueue *fpq;
> > > > +       bool expired = false;
> > > > +       int i;
> > > > +
> > > > +       spin_lock(&fiq->lock);
> > > > +       req = list_first_entry_or_null(&fiq->pending, struct fuse_req, list);
> > > > +       if (req)
> > > > +               expired = request_expired(fc, req, hang_check_timer);
> > > > +       spin_unlock(&fiq->lock);
> > > > +       if (expired)
> > > > +               goto abort_conn;
> > > > +
> > > > +       spin_lock(&fc->bg_lock);
> > > > +       req = list_first_entry_or_null(&fc->bg_queue, struct fuse_req, list);
> > > > +       if (req)
> > > > +               expired = request_expired(fc, req, hang_check_timer);
> > > > +       spin_unlock(&fc->bg_lock);
> > > > +       if (expired)
> > > > +               goto abort_conn;
> > > > +
> > > > +       spin_lock(&fc->lock);
> > > > +       if (!fc->connected) {
> > > > +               spin_unlock(&fc->lock);
> > > > +               return;
> > > > +       }
> > > > +       list_for_each_entry(fud, &fc->devices, entry) {
> > > > +               fpq = &fud->pq;
> > > > +               spin_lock(&fpq->lock);
> > > > +               req = list_first_entry_or_null(&fpq->io, struct fuse_req, list);
> > > > +               if (req && request_expired(fc, req, hang_check_timer))
> > > > +                       goto fpq_abort;
> > > > +
> > > > +               for (i = 0; i < FUSE_PQ_HASH_SIZE; i++) {
> > > > +                       req = list_first_entry_or_null(&fpq->processing[i], struct fuse_req, list);
> > > > +                       if (req && request_expired(fc, req, hang_check_timer))
> > > > +                               goto fpq_abort;
> > > > +               }
> > > > +               spin_unlock(&fpq->lock);
> > > > +       }
> > > > +       /* Keep the ball rolling */
> > > > +       if (atomic_read(&fc->num_waiting) != 0)
> > > > +               queue_delayed_work(system_wq, &fc->work, hang_check_timer);
> > > > +       spin_unlock(&fc->lock);
> > > > +       return;
> > > > +
> > > > +fpq_abort:
> > > > +       spin_unlock(&fpq->lock);
> > > > +       spin_unlock(&fc->lock);
> > > > +abort_conn:
> > > > +       fuse_abort_conn(fc);
> > > > +}
> > > > +
> > > >  static void fuse_request_init(struct fuse_mount *fm, struct fuse_req *req)
> > > >  {
> > > > +       struct fuse_conn *fc = fm->fc;
> > > >         INIT_LIST_HEAD(&req->list);
> > > >         INIT_LIST_HEAD(&req->intr_entry);
> > > >         init_waitqueue_head(&req->waitq);
> > > >         refcount_set(&req->count, 1);
> > > >         __set_bit(FR_PENDING, &req->flags);
> > > >         req->fm = fm;
> > > > +       req->create_time = jiffies;
> > > > +
> > > > +       if (sysctl_hung_task_panic) {
> > > > +               spin_lock(&fc->lock);
> > > > +               /* Get the ball rolling */
> > > > +               queue_delayed_work(system_wq, &fc->work,
> > > > +                               sysctl_hung_task_timeout_secs * (HZ / 2));
> > > > +               spin_unlock(&fc->lock);
> > > > +       }
> > > >  }
> > > >
> > > >  static struct fuse_req *fuse_request_alloc(struct fuse_mount *fm, gfp_t flags)
> > > > @@ -200,6 +292,14 @@ static void fuse_put_request(struct fuse_req *req)
> > > >                         fuse_drop_waiting(fc);
> > > >                 }
> > > >
> > > > +               if (sysctl_hung_task_panic) {
> > > > +                       spin_lock(&fc->lock);
> > > > +                       /* Stop the timeout check if we are the last request */
> > > > +                       if (atomic_read(&fc->num_waiting) == 0)
> > > > +                               cancel_delayed_work_sync(&fc->work);
> > > > +                       spin_unlock(&fc->lock);
> > > > +               }
> > > > +
> > > >                 fuse_request_free(req);
> > > >         }
> > > >  }
> > > > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> > > > index 74744c6f2860..aba3ffd0fb67 100644
> > > > --- a/fs/fuse/fuse_i.h
> > > > +++ b/fs/fuse/fuse_i.h
> > > > @@ -438,6 +438,9 @@ struct fuse_req {
> > > >
> > > >         /** fuse_mount this request belongs to */
> > > >         struct fuse_mount *fm;
> > > > +
> > > > +       /** When (in jiffies) the request was created */
> > > > +       unsigned long create_time;
> > > >  };
> > > >
> > > >  struct fuse_iqueue;
> > > > @@ -923,6 +926,9 @@ struct fuse_conn {
> > > >         /** IDR for backing files ids */
> > > >         struct idr backing_files_map;
> > > >  #endif
> > > > +
> > > > +       /** Request wait timeout check */
> > > > +       struct delayed_work work;
> > > >  };
> > > >
> > > >  /*
> > > > @@ -1190,6 +1196,8 @@ void fuse_request_end(struct fuse_req *req);
> > > >  /* Abort all requests */
> > > >  void fuse_abort_conn(struct fuse_conn *fc);
> > > >  void fuse_wait_aborted(struct fuse_conn *fc);
> > > > +/* Connection timeout */
> > > > +void fuse_check_timeout(struct work_struct *wk);
> > > >
> > > >  /**
> > > >   * Invalidate inode attributes
> > > > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> > > > index 3ce4f4e81d09..ed96154df0fd 100644
> > > > --- a/fs/fuse/inode.c
> > > > +++ b/fs/fuse/inode.c
> > > > @@ -23,6 +23,7 @@
> > > >  #include <linux/exportfs.h>
> > > >  #include <linux/posix_acl.h>
> > > >  #include <linux/pid_namespace.h>
> > > > +#include <linux/completion.h>
> > > >  #include <uapi/linux/magic.h>
> > > >
> > > >  MODULE_AUTHOR("Miklos Szeredi <miklos@xxxxxxxxxx>");
> > > > @@ -964,6 +965,7 @@ void fuse_conn_init(struct fuse_conn *fc, struct fuse_mount *fm,
> > > >         INIT_LIST_HEAD(&fc->entry);
> > > >         INIT_LIST_HEAD(&fc->devices);
> > > >         atomic_set(&fc->num_waiting, 0);
> > > > +       INIT_DELAYED_WORK(&fc->work, fuse_check_timeout);
> > > >         fc->max_background = FUSE_DEFAULT_MAX_BACKGROUND;
> > > >         fc->congestion_threshold = FUSE_DEFAULT_CONGESTION_THRESHOLD;
> > > >         atomic64_set(&fc->khctr, 0);
> > > > @@ -1015,6 +1017,7 @@ void fuse_conn_put(struct fuse_conn *fc)
> > > >                 if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH))
> > > >                         fuse_backing_files_free(fc);
> > > >                 call_rcu(&fc->rcu, delayed_release);
> > > > +               cancel_delayed_work_sync(&fc->work);
> > > >         }
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(fuse_conn_put);
> > > > diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
> > > > index 5a64582b086b..1ed3a511060d 100644
> > > > --- a/include/linux/sched/sysctl.h
> > > > +++ b/include/linux/sched/sysctl.h
> > > > @@ -5,11 +5,15 @@
> > > >  #include <linux/types.h>
> > > >
> > > >  #ifdef CONFIG_DETECT_HUNG_TASK
> > > > -/* used for hung_task and block/ */
> > > > +/* used for hung_task, block/ and fuse */
> > > >  extern unsigned long sysctl_hung_task_timeout_secs;
> > > > +extern unsigned int sysctl_hung_task_panic;
> > > >  #else
> > > >  /* Avoid need for ifdefs elsewhere in the code */
> > > > -enum { sysctl_hung_task_timeout_secs = 0 };
> > > > +enum {
> > > > +       sysctl_hung_task_timeout_secs = 0,
> > > > +       sysctl_hung_task_panic = 0,
> > > > +};
> > > >  #endif
> > > >
> > > >  enum sched_tunable_scaling {
> > > > diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> > > > index c18717189f32..16602d3754b1 100644
> > > > --- a/kernel/hung_task.c
> > > > +++ b/kernel/hung_task.c
> > > > @@ -78,8 +78,9 @@ static unsigned int __read_mostly sysctl_hung_task_all_cpu_backtrace;
> > > >   * Should we panic (and reboot, if panic_timeout= is set) when a
> > > >   * hung task is detected:
> > > >   */
> > > > -static unsigned int __read_mostly sysctl_hung_task_panic =
> > > > +unsigned int __read_mostly sysctl_hung_task_panic =
> > > >         IS_ENABLED(CONFIG_BOOTPARAM_HUNG_TASK_PANIC);
> > > > +EXPORT_SYMBOL_GPL(sysctl_hung_task_panic);
> > > >
> > > >  static int
> > > >  hung_task_panic(struct notifier_block *this, unsigned long event, void *ptr)
> > > > --
> > > > 2.34.1
> > > >





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux