On Sat, Dec 14, 2024 at 4:10 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > On Fri, 2024-12-13 at 18:28 -0800, Joanne Koong wrote: > > There are situations where fuse servers can become unresponsive or > > stuck, for example if the server is deadlocked. Currently, there's no > > good way to detect if a server is stuck and needs to be killed manually. > > > > This commit adds an option for enforcing a timeout (in seconds) for > > requests where if the timeout elapses without the server responding to > > the request, the connection will be automatically aborted. > > > > Please note that these timeouts are not 100% precise. For example, the > > request may take roughly an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond > > the requested timeout due to internal implementation, in order to > > mitigate overhead. > > > > Signed-off-by: Joanne Koong <joannelkoong@xxxxxxxxx> > > --- > > fs/fuse/dev.c | 83 ++++++++++++++++++++++++++++++++++++++++++++++++ > > fs/fuse/fuse_i.h | 22 +++++++++++++ > > fs/fuse/inode.c | 23 ++++++++++++++ > > 3 files changed, 128 insertions(+) > > > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c > > index 27ccae63495d..e97ba860ffcd 100644 > > --- a/fs/fuse/dev.c > > +++ b/fs/fuse/dev.c > > > > static struct fuse_req *fuse_request_alloc(struct fuse_mount *fm, gfp_t flags) > > @@ -2308,6 +2388,9 @@ void fuse_abort_conn(struct fuse_conn *fc) > > spin_unlock(&fc->lock); > > > > end_requests(&to_end); > > + > > + if (fc->timeout.req_timeout) > > + cancel_delayed_work(&fc->timeout.work); > > As Sergey pointed out, this should be a cancel_delayed_work_sync(). The > workqueue job can still be running after cancel_delayed_work(), and > since it requeues itself, this might not be enough to kill it > completely. I don't think we need to synchronously cancel it when a connection is aborted. The fuse_check_timeout() workqueue job can be simultaneously running when cancel_delayed_work() is called and can requeue itself, but then on the next trigger of the job, it will check whether the connection was aborted (eg the if (!fc->connected)... return; lines in fuse_check_timeout()) and will not requeue itself if the connection was aborted. This seemed like the simplest / cleanest approach to me. > > Also, I'd probably do this at the start of fuse_abort_conn() instead of > waiting until the end. By the time you're in that function, you're > killing the connection anyway, and you probably don't want the > workqueue job running at the same time. They'll just end up competing > for the same locks. Sounds good, I'll move this to be called right after the "if (fc->connected)" line. Thanks, Joanne > > > } else { > > spin_unlock(&fc->lock); > > } > -- > Jeff Layton <jlayton@xxxxxxxxxx>