On Tue, Dec 10, 2024 at 4:07 PM Bernd Schubert <bernd.schubert@xxxxxxxxxxx> wrote: > > > > On 12/10/24 18:16, etmartin4313@xxxxxxxxx wrote: > > From: Etienne Martineau <etmartin4313@xxxxxxxxx> > > > > This patch abort connection if HUNG_TASK_PANIC is set and a FUSE server > > is getting stuck for too long. > > > > Without this patch, an unresponsive / buggy / malicious FUSE server can > > leave the clients in D state for a long period of time and on system where > > HUNG_TASK_PANIC is set, trigger a catastrophic reload. > > > > So, if HUNG_TASK_PANIC checking is enabled, we should wake up periodically > > to abort connections that exceed the timeout value which is define to be > > half the HUNG_TASK_TIMEOUT period, which keeps overhead low. > > > > This patch introduce a list of request waiting for answer that is time > > sorted to minimize the overhead. > > > > When HUNG_TASK_PANIC is enable there is a timeout check per connection > > that is running at low frequency only if there are active FUSE request > > pending. > > > > A FUSE client can get into D state as such ( see below Scenario #1 / #2 ) > > 1) request_wait_answer() -> wait_event() is UNINTERRUPTIBLE > > OR > > 2) request_wait_answer() -> wait_event_(interruptible / killable) is head > > of line blocking for subsequent clients accessing the same file > > > I don't think that will help you for fuse background requests. > > [422820.431981] INFO: task dd:1590644 blocked for more than 120 seconds. > [422820.436556] Not tainted 6.13.0-rc1+ #92 > [422820.439189] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [422820.446822] task:dd state:D stack:27440 pid:1590644 tgid:1590644 ppid:1590478 flags:0x00000002 > [422820.456782] Call Trace: > [422820.459467] <TASK> > [422820.461667] __schedule+0x1b42/0x25b0 > [422820.465312] schedule+0xb5/0x260 > [422820.468568] schedule_preempt_disabled+0x19/0x30 > [422820.473033] rwsem_down_write_slowpath+0x8a6/0x12b0 > [422820.477644] ? generic_file_write_iter+0x82/0x240 > [422820.481774] down_write+0x16f/0x1a0 > [422820.486756] generic_file_write_iter+0x82/0x240 > [422820.490412] ? fuse_file_read_iter+0x490/0x490 [fuse] > [422820.493021] vfs_write+0x7c8/0xb70 > [422820.494389] ? fuse_file_read_iter+0x490/0x490 [fuse] > [422820.497003] ksys_write+0xce/0x170 > [422820.500110] do_syscall_64+0x81/0x120 > [422820.502941] ? irqentry_exit_to_user_mode+0x133/0x180 > [422820.505504] entry_SYSCALL_64_after_hwframe+0x4b/0x53 > > > Joannes timeout patches are more generic and handle these as well. > > > Thanks, > Bernd Thanks for pointing out this scenario. Looks like similar logic can be applied for all request including bg request Something like fuse_request_init() /* Start the timer if fc->num_waiting == 1 */ fuse_put_request() /* Stop the timer if fc->num_waiting == 0 */ I'll experiment some more on that angle and come back Thanks, Etienne