On Wed, Aug 21, 2024 at 03:47:53PM +0200, Miklos Szeredi wrote: > On Wed, 14 Aug 2024 at 01:23, Joanne Koong <joannelkoong@xxxxxxxxx> wrote: > > > > There are situations where fuse servers can become unresponsive or take > > too long to reply to a request. Currently there is no upper bound on > > how long a request may take, which may be frustrating to users who get > > stuck waiting for a request to complete. > > > > This patchset adds a timeout option for requests and two dynamically > > configurable fuse sysctls "default_request_timeout" and "max_request_timeout" > > for controlling/enforcing timeout behavior system-wide. > > > > Existing fuse servers will not be affected unless they explicitly opt into the > > timeout. > > I sort of understand the motivation, but do not clearly see why this > is required. > > A well written server will be able to do request timeouts properly, > without the kernel having to cut off requests mid flight without the > knowledge of the server. The latter could even be dangerous because > locking guarantees previously provided by the kernel do not apply > anymore. > > Can you please explain why this needs to be done by the client > (kernel) instead of the server (userspace)? > "A well written server" is the key part here ;). In our case we had a "well written server" that ended up having a deadlock and we had to run around with a drgn script to find those hung mounts and kill them manually. The usecase here is specifically for bugs in the FUSE server to allow us to cleanup automatically with EIO's rather than a drgn script to figure out if the mount is hung. It also gives us the opportunity to do the things that Bernd points out, specifically remove the double buffering downside as we can trust that eventually writeback will either succeed or timeout. Thanks, Josef