On Mon, Jul 23, 2018 at 2:46 PM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote: > On Mon, Jul 23, 2018 at 2:33 PM, Miklos Szeredi <miklos@xxxxxxxxxx> wrote: >>>>> On Mon, Jul 23, 2018 at 9:59 AM, syzbot >>>>> <syzbot+bb6d800770577a083f8c@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote: >>>>>> Hello, >>>>>> >>>>>> syzbot found the following crash on: >>>>>> >>>>>> HEAD commit: d72e90f33aa4 Linux 4.18-rc6 >>>>>> git tree: upstream >>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=1324f794400000 >>>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=68af3495408deac5 >>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=bb6d800770577a083f8c >>>>>> compiler: gcc (GCC) 8.0.1 20180413 (experimental) >>>>>> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=11564d1c400000 >>>>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=16fc570c400000 >>>>> >>>>> >>>>> Hi fuse maintainers, >>>>> >>>>> We are seeing a bunch of such deadlocks in fuse on syzbot. As far as I >>>>> understand this is mostly working-as-intended (parts about deadlocks >>>>> in Documentation/filesystems/fuse.txt). The intended way to resolve >>>>> this is aborting connections via fusectl, right? >>>> >>>> Yes. Alternative is with "umount -f". >>>> >>>>> The doc says "Under >>>>> the fuse control filesystem each connection has a directory named by a >>>>> unique number". The question is: if I start a process and this process >>>>> can mount fuse, how do I kill it? I mean: totally and certainly get >>>>> rid of it right away? How do I find these unique numbers for the >>>>> mounts it created? >>>> >>>> It is the device number found in st_dev for the mount. Other than >>>> doing stat(2) it is possible to find out the device number by reading >>>> /proc/$PID/mountinfo (third field). >>> >>> Thanks. I will try to figure out fusectl connection numbers and see if >>> it's possible to integrate aborting into syzkaller. >>> >>>>> Taking into account that there is usually no >>>>> operator attached to each server, I wonder if kernel could somehow >>>>> auto-abort fuse on kill? >>>> >>>> Depends on what the fuse server is sleeping on. If it's trying to >>>> acquire an inode lock (e.g. unlink(2)), which is classical way to >>>> deadlock a fuse filesystem, then it will go into an uninterruptible >>>> sleep. There's no way in which that process can be killed except to >>>> force a release of the offending lock, which can only be done by >>>> aborting the request that is being performed while holding that lock. >>> >>> I understand that it is not killed today, but I am asking if we can >>> make it killable. It's all code that we can change, and if a human >>> operator can do it, it can be done pure programmatically on kill too, >>> right? >> >> Hmm, you mean if a process is in an uninterruptible sleep trying to >> acquire a lock on a fuse filesystem and is killed, then the fuse >> filesystem should be aborted? >> >> Even if we'd manage to implement that, it's a large backward >> incompatibility risk. >> >> I don't argue that it can be done, but I would definitely argue *if* >> it should be done. > > > I understand that we should abort only if we are sure that it's > actually deadlocked and there is no other way. > So if fuse-user process is blocked on fuse lock, then we probably > should do nothing. However, if the fuse-server is killed, then perhaps > we could abort the connection at that point. Namely, if a process that > has a fuse fd open is killed and it is the only process that shared > this fd, then we could abort the connection on arrival of the kill > signal (rather than wait untill all it's threads finish and then start > closing all fd's, this is where we get the deadlock -- some of its > threads won't finish). I don't know if such synchronous kill hook is > available, though. If several processes shared the same fuse fd, then > we could close the fd in each process on SIGKILL arrival, then when > all of these processes are killed, fuse fd will be closed and we can > abort the connection, which will un-deadlock all of these processes. > Does this look any reasonable? Biggest conceptual problem: your definition of fuse-server is weak. Take the following example: process A is holding the fuse device fd and is forwarding requests and replies to/from process B via a pipe. So basically A is just a proxy that does nothing interesting, the "real" server is B. But according to your definition B is not a server, only A is. And this is just a simple example, parts of the server might be on different machines, etc... It's impossible to automatically detect if a process is acting as a fuse server or not. We could let the fuse server itself notify the kernel that it's a fuse server. That might help in the cases where the deadlock is accidental, but obviously not in the case when done by a malicious agent. I'm not sure it's worth the effort. Also I have no idea how the respective maintainers would take the idea of "kill hooks"... It would probably be a lot of work for little gain. Thanks, Miklos