On Mon, Aug 21, 2023 at 04:24:00PM +0200, Miklos Szeredi wrote: > On Tue, 15 Aug 2023 at 00:36, Tycho Andersen <tycho@tycho.pizza> wrote: > > > > On Mon, Aug 14, 2023 at 04:35:56PM +0200, Miklos Szeredi wrote: > > > On Mon, 14 Aug 2023 at 16:00, Tycho Andersen <tycho@tycho.pizza> wrote: > > > > > > > It seems like we really do need to wait here. I guess that means we > > > > need some kind of exit-proof wait? > > > > > > Could you please recap the original problem? > > > > Sure, the symptom is a deadlock, something like: > > > > # cat /proc/1528591/stack > > [<0>] do_wait+0x156/0x2f0 > > [<0>] kernel_wait4+0x8d/0x140 > > [<0>] zap_pid_ns_processes+0x104/0x180 > > [<0>] do_exit+0xa41/0xb80 > > [<0>] do_group_exit+0x3a/0xa0 > > [<0>] __x64_sys_exit_group+0x14/0x20 > > [<0>] do_syscall_64+0x37/0xb0 > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae > > > > which is stuck waiting for: > > > > # cat /proc/1544574/stack > > [<0>] request_wait_answer+0x12f/0x210 > > [<0>] fuse_simple_request+0x109/0x2c0 > > [<0>] fuse_flush+0x16f/0x1b0 > > [<0>] filp_close+0x27/0x70 > > [<0>] put_files_struct+0x6b/0xc0 > > [<0>] do_exit+0x360/0xb80 > > [<0>] do_group_exit+0x3a/0xa0 > > [<0>] get_signal+0x140/0x870 > > [<0>] arch_do_signal_or_restart+0xae/0x7c0 > > [<0>] exit_to_user_mode_prepare+0x10f/0x1c0 > > [<0>] syscall_exit_to_user_mode+0x26/0x40 > > [<0>] do_syscall_64+0x46/0xb0 > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae > > > > I have a reproducer here: > > https://github.com/tych0/kernel-utils/blob/master/fuse2/Makefile#L7 > > The issue seems to be that the server process is recursing into the > filesystem it is serving (nested_fsync()). It's quite easy to > deadlock fuse this way, and I'm not sure why this would be needed for > any server implementation. Can you explain? I think the idea is that they're saving snapshots of their own threads to the fs for debugging purposes. Whether this is a sane thing to do or not, it doesn't seem like it should deadlock pid ns destruction. Tycho