On Mon, Dec 2, 2024 at 8:31 PM Sergey Senozhatsky <senozhatsky@xxxxxxxxxxxx> wrote: > > On (24/12/02 11:29), Joanne Koong wrote: > > > >> In those cases 1 minute fuse timeout will overshot HUNG_TASK_TIMEOUT > > > >> and then the question is whether HUNG_TASK_PANIC is set. > > > >> > > > >> On the other hand, setups that set much lower timeout than > > > >> DEFAULT_HUNG_TASK_TIMEOUT=120 will have extra CPU activities regardless, > > > >> just because watchdogs will run more often. > > > >> > > > >> Tomasz, any opinions? > > > > > > > > First of all, thanks everyone for looking into this. > > > > Hi Sergey and Tomasz, > > > > Sorry for the late reply - I was out the last couple of days. Thanks > > Bernd for weighing in and answering the questions! > > > > > > > > > > How about keeping a list of requests in the FIFO order (in other > > > > words: first entry is the first to timeout) and whenever the first > > > > entry is being removed from the list (aka the request actually > > > > completes), re-arming the timer to the timeout of the next request in > > > > the list? This way we don't really have any timer firing unless there > > > > is really a request that timed out. > > > > I think the issue with this is that we likely would end up wasting > > more cpu cycles. For a busy FUSE server, there could be hundreds > > (thousands?) of requests that happen within the span of > > FUSE_TIMEOUT_TIMER_FREQ seconds. > > So, a silly question - can we not do that maybe? > > What I'm thinking about is what if instead of implementing fuse-watchdog > and tracking jiffies per request we'd switch to timeout aware operations > and use what's already in the kernel? E.g. instead of wait_event() we'd > use wait_event_timeout() and would configure timeout per connection > (also bringing in current hung-task-watchdog timeout value into the > equation), using MAX_SCHEDULE_TIMEOUT as a default (similarly to what > core kernel does). The first req that timeouts kills its siblings and > the connection. Using timeout aware operations like wait_event_timeout() associates a timer per request (see schedule_timeout()) and this approach was tried in v6 [1] but the overhead of having a timer per request showed about a 1.5% drop in throughput [1], which is why we ended up pivoting to a periodic watchdog timer that triggers at set intervals. Thanks, Joanne [1] https://lore.kernel.org/linux-fsdevel/CAJnrk1bdyDq+4jo29ZbyjdcbFiU2qyCGGbYbqQc_G23+B_Xe_Q@xxxxxxxxxxxxxx/