On Wed, 27 Jul 2022 at 07:20, Nikhil Kshirsagar <nkshirsagar@xxxxxxxxx> wrote: > > Hello Mikolos and Dmitri! > > I'm trying to debug a fuse-overlayfs hang in the Ubuntu kernel, with versions, > > fuse_overlayfs: 1.7.1-1 (universe) > kernel: 5.15.0-40-generic (server) > > This happens when fuse-overlayfs > (https://github.com/containers/fuse-overlayfs) is stacked on top of > squashfuse (https://github.com/vasi/squashfuse) to allow users to > quickly start a container from a squashfs file without any privileges. > > The hang looks like this > > Jul 26 17:46:31 kernel: INFO: task fuse-overlayfs:326111 blocked for > more than 120 seconds. > Jul 26 17:46:31 kernel: Tainted: P OE 5.15.0-40-generic #43-Ubuntu > Jul 26 17:46:31 kernel: "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Jul 26 17:46:31 kernel: task:fuse-overlayfs state:D stack: 0 > pid:326111 ppid:326103 flags:0x00000002 > Jul 26 17:46:31 kernel: Call Trace: > Jul 26 17:46:31 kernel: <TASK> > Jul 26 17:46:31 kernel: __schedule+0x23d/0x590 > Jul 26 17:46:31 kernel: ? update_load_avg+0x82/0x620 > Jul 26 17:46:31 kernel: schedule+0x4e/0xb0 > Jul 26 17:46:31 kernel: schedule_preempt_disabled+0xe/0x10 > Jul 26 17:46:31 kernel: __mutex_lock.constprop.0+0x263/0x490 > Jul 26 17:46:31 kernel: ? kmem_cache_alloc+0x1ab/0x2e0 > Jul 26 17:46:31 kernel: __mutex_lock_slowpath+0x13/0x20 > Jul 26 17:46:31 kernel: mutex_lock+0x34/0x40 > Jul 26 17:46:31 kernel: fuse_lock_inode+0x2f/0x40 > Jul 26 17:46:31 kernel: fuse_lookup+0x48/0x1b0 > Jul 26 17:46:31 kernel: ? d_alloc_parallel+0x235/0x4b0 > Jul 26 17:46:31 kernel: ? __legitimize_path+0x2d/0x60 > Jul 26 17:46:31 kernel: __lookup_slow+0x81/0x150 > Jul 26 17:46:31 kernel: walk_component+0x141/0x1b0 > Jul 26 17:46:31 kernel: link_path_walk.part.0.constprop.0+0x23b/0x360 > Jul 26 17:46:31 kernel: ? path_init+0x2bc/0x3e0 > Jul 26 17:46:31 kernel: path_lookupat+0x3e/0x1b0 > Jul 26 17:46:31 kernel: filename_lookup+0xcf/0x1d0 > Jul 26 17:46:31 kernel: ? __check_object_size+0x19/0x20 > Jul 26 17:46:31 kernel: ? strncpy_from_user+0x44/0x140 > Jul 26 17:46:31 kernel: ? getname_flags.part.0+0x4c/0x1b0 > Jul 26 17:46:31 kernel: user_path_at_empty+0x3f/0x60 > Jul 26 17:46:31 kernel: path_getxattr+0x4a/0xb0 > Jul 26 17:46:31 kernel: ? __secure_computing+0xa5/0x110 > Jul 26 17:46:31 kernel: __x64_sys_lgetxattr+0x21/0x30 > Jul 26 17:46:31 kernel: do_syscall_64+0x59/0xc0 > Jul 26 17:46:31 kernel: ? do_syscall_64+0x69/0xc0 > Jul 26 17:46:31 kernel: ? do_syscall_64+0x69/0xc0 > Jul 26 17:46:31 kernel: ? irqentry_exit+0x19/0x30 > Jul 26 17:46:31 kernel: ? exc_page_fault+0x89/0x160 > Jul 26 17:46:31 kernel: ? asm_exc_page_fault+0x8/0x30 > Jul 26 17:46:31 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae > Jul 26 17:46:31 kernel: RIP: 0033:0x7ffff7e6d2ae > Jul 26 17:46:31 kernel: RSP: 002b:00007fffffff7528 EFLAGS: 00000202 > ORIG_RAX: 00000000000000c0 > Jul 26 17:46:31 kernel: RAX: ffffffffffffffda RBX: 000055555556d6f0 > RCX: 00007ffff7e6d2ae > Jul 26 17:46:31 kernel: RDX: 00007fffffff8570 RSI: 0000555555566190 > RDI: 00007fffffff7530 > Jul 26 17:46:31 kernel: RBP: 0000555555566190 R08: 0000000000000010 > R09: 0000555555579cf0 > Jul 26 17:46:31 kernel: R10: 0000000000000010 R11: 0000000000000202 > R12: 00007fffffff8570 > Jul 26 17:46:31 kernel: R13: 0000000000000010 R14: 00007fffffff7530 > R15: 0000000000000000 > Jul 26 17:46:31 kernel: </TASK> > > Seems to me the &get_fuse_inode(inode)->mutex cannot be locked, > > bool fuse_lock_inode(struct inode *inode) > { > bool locked = false; > > if (!get_fuse_conn(inode)->parallel_dirops) { > mutex_lock(&get_fuse_inode(inode)->mutex); > locked = true; > } > > return locked; > } > > Please would you be able to help me understand if this is a > known/reported issue, and has any fix/patch? > > Regards, > Nikhil. +linux-fsdevel, syzkaller Hi Nikhil, Re known bugs: we have 5 open bugs that mention "fuse" in the title, including some task hangs with reproducers: https://syzkaller.appspot.com/upstream These may be the easiest to check first. There were also some fixed task hangs in fuse: https://syzkaller.appspot.com/upstream/fixed But they look old enough, so fixes are probably already in your kernel.