On Tue, Jul 24, 2018 at 5:17 PM, Miklos Szeredi <miklos@xxxxxxxxxx> wrote: >>>>> Biggest conceptual problem: your definition of fuse-server is weak. >>>>> Take the following example: process A is holding the fuse device fd >>>>> and is forwarding requests and replies to/from process B via a pipe. >>>>> So basically A is just a proxy that does nothing interesting, the >>>>> "real" server is B. But according to your definition B is not a >>>>> server, only A is. >>>> >>>> I proposed to abort fuse conn when all fuse device fd's are "killed" >>>> (all processes having the fd opened are killed). So if _only_ process >>>> B is killed, then, yes, it will still hang. However if A is killed or >>>> both A and B (say, process group, everything inside of pid namespace, >>>> etc) then the deadlock will be autoresolved without human >>>> intervention. >>> >>> Okay, so you're saying: >>> >>> 1) when process gets SIGKILL and is uninterruptible sleep mark process as doomed >>> 2) for a particular fuse instance find set of fuse device fd >>> references that are in non-doomed tasks; if there are none then abort >>> fuse instance >>> >>> Right? >> >> >> Yes, something like this. >> Perhaps checking for "uninterruptible sleep" is excessive. If it has >> SIGKILL pending it's pretty much doomed already. This info should be >> already available for tasks. >> Not saying that it's better, but what I described was the other way >> around: when a task killed it drops a reference to all opened fuse >> fds, when the last fd is dropped, the connection can be aborted. > > struct task_struct { > [...] > struct files_struct *files; > [...] > }; > > struct files_struct { > [...] > struct fdtable __rcu *fdt; > [...] > }; > > struct fdtable { > [...] > struct file __rcu **fd; /* current fd array */ > [...] > }; > > So there we have an array of pointers to struct files. Suppose we'd > magically be able to find files that point to fuse devices upon > receiving SIGKILL, what would we do with them? We can't close them: > other tasks might still be pointing to the same files_struct. > > We could do a global search for non-doomed tasks referencing the same > fuse device, but I have no clue how we'd go about doing that without > racing with forks, fd sending, etc... Good questions for which I don't have answers. Maybe more waits in fuse need to be interruptible? E.g. request_wait_answer?