Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests

Sergey Senozhatsky <senozhatsky@xxxxxxxxxxxx> · Tue, 3 Dec 2024 13:31:18 +0900

On (24/12/02 11:29), Joanne Koong wrote:
> > >> In those cases 1 minute fuse timeout will overshot HUNG_TASK_TIMEOUT
> > >> and then the question is whether HUNG_TASK_PANIC is set.
> > >>
> > >> On the other hand, setups that set much lower timeout than
> > >> DEFAULT_HUNG_TASK_TIMEOUT=120 will have extra CPU activities regardless,
> > >> just because watchdogs will run more often.
> > >>
> > >> Tomasz, any opinions?
> > >
> > > First of all, thanks everyone for looking into this.
> 
> Hi Sergey and Tomasz,
> 
> Sorry for the late reply - I was out the last couple of days. Thanks
> Bernd for weighing in and answering the questions!
> 
> > >
> > > How about keeping a list of requests in the FIFO order (in other
> > > words: first entry is the first to timeout) and whenever the first
> > > entry is being removed from the list (aka the request actually
> > > completes), re-arming the timer to the timeout of the next request in
> > > the list? This way we don't really have any timer firing unless there
> > > is really a request that timed out.
> 
> I think the issue with this is that we likely would end up wasting
> more cpu cycles. For a busy FUSE server, there could be hundreds
> (thousands?) of requests that happen within the span of
> FUSE_TIMEOUT_TIMER_FREQ seconds.

So, a silly question - can we not do that maybe?

What I'm thinking about is what if instead of implementing fuse-watchdog
and tracking jiffies per request we'd switch to timeout aware operations
and use what's already in the kernel?  E.g. instead of wait_event() we'd
use wait_event_timeout() and would configure timeout per connection
(also bringing in current hung-task-watchdog timeout value into the
equation), using MAX_SCHEDULE_TIMEOUT as a default (similarly to what
core kernel does).  The first req that timeouts kills its siblings and
the connection.