On Fri, Jul 17, 2020 at 2:52 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > On Fri, Jul 17, 2020 at 02:39:03PM +0200, Miklos Szeredi wrote: > > On Fri, Jul 17, 2020 at 10:07 AM Paul Menzel <pmenzel@xxxxxxxxxxxxx> wrote: > > > [105591.121285] INFO: task ls:21242 blocked for more than 120 seconds. > > > [105591.121293] Not tainted 5.7.0-1-amd64 #1 Debian 5.7.6-1 > > > [105591.121295] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > > disables this message. > > > [105591.121298] ls D 0 21242 778 0x00004004 > > > [105591.121304] Call Trace: > > > [105591.121319] __schedule+0x2da/0x770 > > > [105591.121326] schedule+0x4a/0xb0 > > > [105591.121339] request_wait_answer+0x122/0x210 [fuse] > > > [105591.121349] ? finish_wait+0x80/0x80 > > > [105591.121357] fuse_simple_request+0x198/0x290 [fuse] > > > [105591.121366] fuse_do_getattr+0xcf/0x2c0 [fuse] > > > [105591.121376] vfs_statx+0x96/0xe0 > > > > > > The `ls` process cannot be killed. The SSHFS issue *Fuse sshfs blocks > > > standby (Visual Studio Code?)* from 2018 already reported this for Linux > > > 4.17, and the SSHFS developers asked to report this to the Linux kernel. > > > > This is a very old and fundamental issue. Theoretical solution for > > killing the stuck process exists, but it's not trivial and since the > > above mentioned workarounds work well in all cases it's not high > > priority right now. > > What? All you need to do is return -EINTR from fuse_do_getattr() if > there's a fatal signal. What "fundamental issue"? TL;DR: the fundamental issue is not with getattr, but with ops that hold locks. We could make an exception for ops that do not hold locks, but it would not be a solution to the problem, and as I said this is not something we can't live with. The fundamental issue is that a task killed while the userspace filesystem is still performing that operation will release the vfs lock and allow another op requiring that lock tobe sent to the userspace filesystem. This may confuse the userspace filesystem otherwise relying on the locking and quite possibly result in fs corruption. To fix this, we need to add shadow locking somewhere that duplicates the vfs locks but are only released if userspace finished processing the request. Best place to put the shadow locks is probably in the kernel. Thanks, Miklos