Re: [PATCH] fuse: Prevent hung task warning if FUSE server gets stuck

Joanne Koong <joannelkoong@xxxxxxxxx> · Thu, 5 Dec 2024 14:53:25 -0800

On Thu, Dec 5, 2024 at 9:10 AM Etienne <etmartin4313@xxxxxxxxx> wrote:
>
> On Wed, Dec 4, 2024 at 8:51 PM Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx> wrote:
> >
> >
> >
> > On 12/5/24 12:43 AM, etmartin4313@xxxxxxxxx wrote:
> > > From: Etienne Martineau <etmartin4313@xxxxxxxxx>
> > >
> > > If hung task checking is enabled and FUSE server stops responding for a
> > > long period of time, the hung task timer may fire towards the FUSE clients
> > > and trigger stack dumps that unnecessarily alarm the user.
> >
> > Isn't that expected that users shall be notified that there's something
> > wrong with the FUSE service (because of either buggy implementation or
> > malicious purpose)?  Or is it expected that the normal latency of
> > handling a FUSE request is more than 30 seconds?
>
> In one way you're right because seeing those stack dumps tells you
> right away that something is wrong with a FUSE service.
> Having said that, with many FUSE services running, those stack dumps
> are not helpful at pointing out which of the FUSE services is having
> issues.
>
> Maybe we should instead have proper debug in place to dump the FUSE
> connection so that user can abort via
> /sys/fs/fuse/connections/'nn'/abort
> Something like "pr_warn("Fuse connection %u not responding\n", fc->dev);" maybe?

Having some identifying information about which connection is
unresponsive seems useful, but I don't see a straightforward way of
implementing this without adding additional per-request overhead.

>
> Also, now that you are pointing out a malicious implementation, I
> realized that on a system with 'hung_task_panic' set, a non-privileged
> user can easily trip the hung task timer and force a panic.
>
> I just tried the following sequence using FUSE sshfs and without this
> patch my system went down.
>
>  sudo bash -c 'echo 30 > /proc/sys/kernel/hung_task_timeout_secs'
>  sudo bash -c 'echo 1 > /proc/sys/kernel/hung_task_panic'
>  sshfs -o allow_other,default_permissions you@localhost:/home/you/test ./mnt
>  kill -STOP `pidof /usr/lib/openssh/sftp-server`
>  ls ./mnt/
>  ^C

I'm not sure if this addresses your particular use case, but there's a
patch upstream that adds request timeouts
https://lore.kernel.org/linux-fsdevel/20241114191332.669127-1-joannelkoong@xxxxxxxxx/

This can be set globally via sysctls (eg
"/proc/sys/fs/fuse/max_request_timeout") or on a per-server basis. If
the timeout elapses and the request has not been fulfilled (eg
malicious or buggy fuse server), the kernel will abort the connection
automatically.

Thanks,
Joanne

>
> thanks,
> Etienne
>