Re: INFO: task hung in fuse_reverse_inval_entry

Miklos Szeredi <miklos@xxxxxxxxxx> · Mon, 23 Jul 2018 17:09:15 +0200

On Mon, Jul 23, 2018 at 3:37 PM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> On Mon, Jul 23, 2018 at 3:05 PM, Miklos Szeredi <miklos@xxxxxxxxxx> wrote:

>> Biggest conceptual problem: your definition of fuse-server is weak.
>> Take the following example: process A is holding the fuse device fd
>> and is forwarding requests and replies to/from process B via a pipe.
>> So basically A is just a proxy that does nothing interesting, the
>> "real" server is B.  But according to your definition B is not a
>> server, only A is.
>
> I proposed to abort fuse conn when all fuse device fd's are "killed"
> (all processes having the fd opened are killed). So if _only_ process
> B is killed, then, yes, it will still hang. However if A is killed or
> both A and B (say, process group, everything inside of pid namespace,
> etc) then the deadlock will be autoresolved without human
> intervention.

Okay, so you're saying:

1) when process gets SIGKILL and is uninterruptible sleep mark process as doomed
2) for a particular fuse instance find set of fuse device fd
references that are in non-doomed tasks; if there are none then abort
fuse instance

Right?

The above is not an implementation proposal, just to get us on the
same page regarding the concept.

>> And this is just a simple example, parts of the server might be on
>> different machines, etc...  It's impossible to automatically detect if
>> a process is acting as a fuse server or not.
>
> It does not seem we need the precise definition. If no one ever can
> write anything into the fd, we can safely abort the connection (?).

Seems to me so.

> If
> we don't, we can either get that the process exits normally and the
> connection is doomed anyway, so no difference in behavior, or we can
> get a deadlock.
>
>> We could let the fuse server itself notify the kernel that it's a fuse
>> server.  That might help in the cases where the deadlock is
>> accidental, but obviously not in the case when done by a malicious
>> agent.  I'm not sure it's worth the effort.   Also I have no idea how
>> the respective maintainers would take the idea of "kill hooks"...   It
>> would probably be a lot of work for little gain.
>
> What looks wrong to me here is that fuse is only (?) subsystem in
> kernel that stops SIGKILL from working and requires complex custom
> dance performed by a human operator (which is not necessary there at
> all). Say, if a process has opened a socket, whatever, I don't need to
> locate and abort something in socketctl fs, just SIGKILL. If a
> processes has opened a file, I don't need to locate the fd in /proc
> and abort it, just SIGKILL. If a process has created an ipc object, I
> don't need to do any special dance, just SIGKILL. fuse is somehow very
> special, if we have more such cases, it definitely won't scale.
> I understand that there can be implementation difficulties, but
> fundamentally that's how things should work -- choose target
> processes, kill, done, right?

Yes, it would be nice.

But I'm not sure it will fly due to implementation difficulties.  It's
definitely not  a high prio feature currently for me, but I'll happily
accept patches.

Thanks,
Miklos