Miklos Szeredi <miklos@xxxxxxxxxx> writes: > On Mon, Nov 9, 2020 at 7:54 PM Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote: >> >> Miklos Szeredi <miklos@xxxxxxxxxx> writes: >> >> > On Mon, Nov 9, 2020 at 1:48 PM Alexey Gladkov <gladkov.alexey@xxxxxxxxx> wrote: >> >> >> >> This patch removes one kind of the deadlocks inside the fuse daemon. The >> >> problem appear when the fuse daemon itself makes a file operation on its >> >> filesystem and receives a fatal signal. >> >> >> >> This deadlock can be interrupted via fusectl filesystem. But if you have >> >> many fuse mountpoints, it will be difficult to figure out which >> >> connection to break. >> >> >> >> This patch aborts the connection if the fuse server receives a fatal >> >> signal. >> > >> > The patch itself might be acceptable, but I have some questions. >> > >> > To logic of this patch says: >> > >> > "If a task having the fuse device open in it's fd table receives >> > SIGKILL (and filesystem was initially mounted in a non-init user >> > namespace), then abort the filesystem operation" >> > >> > You just say "server" instead of "task having the fuse device open in >> > it's fd table" which is sloppy to say the least. It might also lead >> > to regressions, although I agree that it's unlikely. >> > >> > Also how is this solving any security issue? Just create the request >> > loop using two fuse filesystems and the deadlock avoidance has just >> > been circumvented. So AFAICS "selling" this as a CVE fix is not >> > appropriate. >> >> The original report came in with a CVE on it. So referencing that CVE >> seems reasonable. Even if the issue isn't particularly serious. It is >> very annoying not to be able to kill processes with SIGKILL or the OOM >> killer. >> >> You have a good point about the looping issue. I wonder if there is a >> way to enhance this comparatively simple approach to prevent the more >> complex scenario you mention. > > Let's take a concrete example: > > - task A is "server" for fuse fs a > - task B is "server" for fuse fs b > - task C: chmod(/a/x, ...) > - task A: read UNLINK request > - task A: chmod(/b/x, ...) > - task B: read UNLINK request > - task B: chmod (/a/x, ...) > > Now B is blocking on i_mutex on x , A is waiting for reply from B, C > is holding i_mutex on x and waiting for reply from A. > > At this point B is truly uninterruptible (and I'm not betting large > sums on Al accepting killable VFS locks patches), so killing B is out. > > Killing A with this patch does nothing, since A does not have b's dev > fd in its fdtable. > > Killing C again does nothing, since it has no fuse dev fd at all. > >> Does tweaking the code to close every connection represented by a fuse >> file descriptor after a SIGKILL has been delevered create any problems? > > In the above example are you suggesting that SIGKILL on A would abort > "a" from fs b's code? Yeah, that would work, I guess. Poking into > another instance this way sounds pretty horrid, though. Yes. That is what I am suggesting. Layering purity it does not have. It is also fragile as it only handles interactions between fuse instances. The advantage is that it is a very small amount of code. I think there is enough care to get a small change like that in. (With a big fat comment describing why it is imperfect). I don't know if there is enough care to get the general solution (you describe below) implemented and merged in any kind of timely manner. >> > What's the reason for making this user-ns only? If we drop the >> > security aspect, then I don't see any reason not to do this >> > unconditionally. >> >> >> > Also note, there's a proper solution for making fuse requests always >> > killable, and that is to introduce a shadow locking that ensures >> > correct fs operation in the face of requests that have returned and >> > released their respective VFS locks. Now this would be a much more >> > complex solution, but also a much more correct one, not having issues >> > with correctly defining what a server is (which is not a solvable >> > problem). >> >> Is this the solution that was removed at some point from fuse, >> or are you talking about something else? >> >> I think you are talking about adding a set of fuse specific locks >> so fuse does not need to rely on the vfs locks. I don't quite have >> enough insight to see that bigger problem so if you can expand in more >> detail I would appreciate it. > > Okay, so the problem with making the wait_event() at the end of > request_wait_answer() killable is that it would allow compromising the > server's integrity by unlocking the VFS level lock (which protects the > fs) while the server hasn't yet finished the request. > > The way this would be solvable is to add a fuse level lock for each > VFS level lock. That lock would be taken before the request is sent > to userspace and would be released when the answer is received. > Normally there would be zero contention on these shadow locks, but if > a request is forcibly killed, then the VFS lock is released and the > shadow lock now protects the filesystem. > > This wouldn't solve the case where a fuse fs is deadlocked on a VFS > lock (e.g. task B), but would allow tasks blocked directly on a fuse > filesystem to be killed (e.g. task A or C, both of which would unwind > the deadlock). Are we just talking the inode lock here? I am trying to figure out if this is a straight forward change. Or if it will take a fair amount of work. If the change is just wordy we can probably do the good version and call fuse well and truly fixed. But I don't currently see the problem well enough to know what the good change would look like even on a single code path. Eric