Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests

Joanne Koong <joannelkoong@xxxxxxxxx> · Mon, 2 Dec 2024 21:11:57 -0800

On Mon, Dec 2, 2024 at 8:31 PM Sergey Senozhatsky
<senozhatsky@xxxxxxxxxxxx> wrote:
>
> On (24/12/02 11:29), Joanne Koong wrote:
> > > >> In those cases 1 minute fuse timeout will overshot HUNG_TASK_TIMEOUT
> > > >> and then the question is whether HUNG_TASK_PANIC is set.
> > > >>
> > > >> On the other hand, setups that set much lower timeout than
> > > >> DEFAULT_HUNG_TASK_TIMEOUT=120 will have extra CPU activities regardless,
> > > >> just because watchdogs will run more often.
> > > >>
> > > >> Tomasz, any opinions?
> > > >
> > > > First of all, thanks everyone for looking into this.
> >
> > Hi Sergey and Tomasz,
> >
> > Sorry for the late reply - I was out the last couple of days. Thanks
> > Bernd for weighing in and answering the questions!
> >
> > > >
> > > > How about keeping a list of requests in the FIFO order (in other
> > > > words: first entry is the first to timeout) and whenever the first
> > > > entry is being removed from the list (aka the request actually
> > > > completes), re-arming the timer to the timeout of the next request in
> > > > the list? This way we don't really have any timer firing unless there
> > > > is really a request that timed out.
> >
> > I think the issue with this is that we likely would end up wasting
> > more cpu cycles. For a busy FUSE server, there could be hundreds
> > (thousands?) of requests that happen within the span of
> > FUSE_TIMEOUT_TIMER_FREQ seconds.
>
> So, a silly question - can we not do that maybe?
>
> What I'm thinking about is what if instead of implementing fuse-watchdog
> and tracking jiffies per request we'd switch to timeout aware operations
> and use what's already in the kernel?  E.g. instead of wait_event() we'd
> use wait_event_timeout() and would configure timeout per connection
> (also bringing in current hung-task-watchdog timeout value into the
> equation), using MAX_SCHEDULE_TIMEOUT as a default (similarly to what
> core kernel does).  The first req that timeouts kills its siblings and
> the connection.

Using timeout aware operations like wait_event_timeout() associates a
timer per request (see schedule_timeout()) and this approach was tried
in v6 [1] but the overhead of having a timer per request showed about
a 1.5% drop in throughput [1], which is why we ended up pivoting to a
periodic watchdog timer that triggers at set intervals.

Thanks,
Joanne

[1] https://lore.kernel.org/linux-fsdevel/CAJnrk1bdyDq+4jo29ZbyjdcbFiU2qyCGGbYbqQc_G23+B_Xe_Q@xxxxxxxxxxxxxx/