Re: [PATCH] fuse: Abort connection if FUSE server get stuck

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 10, 2024 at 4:07 PM Bernd Schubert
<bernd.schubert@xxxxxxxxxxx> wrote:
>
>
>
> On 12/10/24 18:16, etmartin4313@xxxxxxxxx wrote:
> > From: Etienne Martineau <etmartin4313@xxxxxxxxx>
> >
> > This patch abort connection if HUNG_TASK_PANIC is set and a FUSE server
> > is getting stuck for too long.
> >
> > Without this patch, an unresponsive / buggy / malicious FUSE server can
> > leave the clients in D state for a long period of time and on system where
> > HUNG_TASK_PANIC is set, trigger a catastrophic reload.
> >
> > So, if HUNG_TASK_PANIC checking is enabled, we should wake up periodically
> > to abort connections that exceed the timeout value which is define to be
> > half the HUNG_TASK_TIMEOUT period, which keeps overhead low.
> >
> > This patch introduce a list of request waiting for answer that is time
> > sorted to minimize the overhead.
> >
> > When HUNG_TASK_PANIC is enable there is a timeout check per connection
> > that is running at low frequency only if there are active FUSE request
> > pending.
> >
> > A FUSE client can get into D state as such ( see below Scenario #1 / #2 )
> >  1) request_wait_answer() -> wait_event() is UNINTERRUPTIBLE
> >     OR
> >  2) request_wait_answer() -> wait_event_(interruptible / killable) is head
> >     of line blocking for subsequent clients accessing the same file
>
>
> I don't think that will help you for fuse background requests.
>
> [422820.431981] INFO: task dd:1590644 blocked for more than 120 seconds.
> [422820.436556]       Not tainted 6.13.0-rc1+ #92
> [422820.439189] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [422820.446822] task:dd              state:D stack:27440 pid:1590644 tgid:1590644 ppid:1590478 flags:0x00000002
> [422820.456782] Call Trace:
> [422820.459467]  <TASK>
> [422820.461667]  __schedule+0x1b42/0x25b0
> [422820.465312]  schedule+0xb5/0x260
> [422820.468568]  schedule_preempt_disabled+0x19/0x30
> [422820.473033]  rwsem_down_write_slowpath+0x8a6/0x12b0
> [422820.477644]  ? generic_file_write_iter+0x82/0x240
> [422820.481774]  down_write+0x16f/0x1a0
> [422820.486756]  generic_file_write_iter+0x82/0x240
> [422820.490412]  ? fuse_file_read_iter+0x490/0x490 [fuse]
> [422820.493021]  vfs_write+0x7c8/0xb70
> [422820.494389]  ? fuse_file_read_iter+0x490/0x490 [fuse]
> [422820.497003]  ksys_write+0xce/0x170
> [422820.500110]  do_syscall_64+0x81/0x120
> [422820.502941]  ? irqentry_exit_to_user_mode+0x133/0x180
> [422820.505504]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
>
>
> Joannes timeout patches are more generic and handle these as well.
>
>
> Thanks,
> Bernd

Thanks for pointing out this scenario.
Looks like similar logic can be applied for all request including bg request
Something like
   fuse_request_init() /* Start the timer if fc->num_waiting == 1 */
   fuse_put_request() /* Stop the timer if fc->num_waiting == 0 */
I'll experiment some more on that angle and come back
Thanks,
Etienne





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux