Re: [PATCH v10 1/2] fuse: add kernel-enforced timeout option for requests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Dec 14, 2024 at 4:10 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
>
> On Fri, 2024-12-13 at 18:28 -0800, Joanne Koong wrote:
> > There are situations where fuse servers can become unresponsive or
> > stuck, for example if the server is deadlocked. Currently, there's no
> > good way to detect if a server is stuck and needs to be killed manually.
> >
> > This commit adds an option for enforcing a timeout (in seconds) for
> > requests where if the timeout elapses without the server responding to
> > the request, the connection will be automatically aborted.
> >
> > Please note that these timeouts are not 100% precise. For example, the
> > request may take roughly an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond
> > the requested timeout due to internal implementation, in order to
> > mitigate overhead.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@xxxxxxxxx>
> > ---
> >  fs/fuse/dev.c    | 83 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  fs/fuse/fuse_i.h | 22 +++++++++++++
> >  fs/fuse/inode.c  | 23 ++++++++++++++
> >  3 files changed, 128 insertions(+)
> >
> > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> > index 27ccae63495d..e97ba860ffcd 100644
> > --- a/fs/fuse/dev.c
> > +++ b/fs/fuse/dev.c
> >
> >  static struct fuse_req *fuse_request_alloc(struct fuse_mount *fm, gfp_t flags)
> > @@ -2308,6 +2388,9 @@ void fuse_abort_conn(struct fuse_conn *fc)
> >               spin_unlock(&fc->lock);
> >
> >               end_requests(&to_end);
> > +
> > +             if (fc->timeout.req_timeout)
> > +                     cancel_delayed_work(&fc->timeout.work);
>
> As Sergey pointed out, this should be a cancel_delayed_work_sync(). The
> workqueue job can still be running after cancel_delayed_work(), and
> since it requeues itself, this might not be enough to kill it
> completely.

I don't think we need to synchronously cancel it when a connection is
aborted. The fuse_check_timeout() workqueue job can be simultaneously
running when cancel_delayed_work() is called and can requeue itself,
but then on the next trigger of the job, it will check whether the
connection was aborted (eg the if (!fc->connected)... return; lines in
fuse_check_timeout()) and will not requeue itself if the connection
was aborted. This seemed like the simplest / cleanest approach to me.

>
> Also, I'd probably do this at the start of fuse_abort_conn() instead of
> waiting until the end. By the time you're in that function, you're
> killing the connection anyway, and you probably don't want the
> workqueue job running at the same time. They'll just end up competing
> for the same locks.

Sounds good, I'll move this to be called right after the "if
(fc->connected)" line.


Thanks,
Joanne
>
> >       } else {
> >               spin_unlock(&fc->lock);
> >       }

> --
> Jeff Layton <jlayton@xxxxxxxxxx>





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux