On 9/2/24 12:37, Miklos Szeredi wrote: > On Fri, 30 Aug 2024 at 18:27, Joanne Koong <joannelkoong@xxxxxxxxx> wrote: >> >> There are situations where fuse servers can become unresponsive or >> stuck, for example if the server is in a deadlock. Currently, there's >> no good way to detect if a server is stuck and needs to be killed >> manually. >> >> This commit adds an option for enforcing a timeout (in seconds) on >> requests where if the timeout elapses without a reply from the server, >> the connection will be automatically aborted. > > Okay. > > I'm not sure what the overhead (scheduling and memory) of timers, but > starting one for each request seems excessive. > > Can we make the timeout per-connection instead of per request? > > I.e. When the first request is sent, the timer is started. When a > reply is received but there are still outstanding requests, the timer > is reset. When the last reply is received, the timer is stopped. > > This should handle the frozen server case just as well. It may not > perfectly handle the case when the server is still alive but for some > reason one or more requests get stuck, while others are still being > processed. The latter case is unlikely to be an issue in practice, > IMO. In case of distributed servers, it can easily happen that one server has an issue, while other servers still process requests. Especially when these are just requests that read/getattr/etc and do not write, i.e. accessing the stuck server is not needed by other servers. So in my opinion not so unlikely. Although for such cases not difficult to timeout within the fuse server. Thanks, Bernd