On Mon, Feb 6, 2017 at 10:57 AM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > On Mon, Feb 06, 2017 at 10:08:06AM +0100, Miklos Szeredi wrote: > >> Yes, I think only page lock can be used to deadlock inside >> fuse_dev_read/write(). So requests that don't have locked pages >> should be okay with just waiting until copy_to/from_user() finishes >> and only then proceeding with the abort. > > Actually, looking at that some more, this might be not true. Anything > that takes ->mmap_sem exclusive and *not* killable makes for another > source of deadlock. > > Initial page fault takes ->mmap_sem shared. OK, request sent to > server and server tries to read() it. In the meanwhile, something > has closed userfaultfd for the same mm_struct. We have userfaultfd_release() > block on attempt to take ->mmap_sem exclusive and from now on any attempt > to grab ->mmap_sem shared will deadlock. And get_user_pages(), as well > as copy_to_user(), etc. can end up doing just that. It doesn't have to > be an mmap of the same file, BTW - any page fault would do. > > All you really need is to have server sharing address space with the > process that steps into original page fault, plus an evicted page > of any nature (anon mmap, whatever) being used as a destination of > read() in server. > > down_read() inside down_read() is fine, unless there had been down_write() > in between. And there are unkillable down_write() on ->mmap_sem - > userfaultfd_release() being one example of such. Many of those can and > probably should become down_write_killable(), but this one can't - there > might be nothing to deliver the signal to, if the final close() happens > e.g. from exit(2). > > Warning: the above might be completely bogus - I'm on way too large > uptime at the moment and most of the last day had been spent digging > through various convoluted code, so take the above with a cartload of > salt. _If_ it's true, that kind of deadlock won't be possible to > break with killing anything or doing umount -f, though. It's not bogus, the deadlock is there. But I think it's breakable in the same way: if the deadlocked request is aborted, the fault will release the page lock as well as mmap_sem, and from there things will resolve themselves. But you are definitely right about needing to clean up that mess in fuse/dev.c and doing so by fixing up the arg refcounting for just the read and write requests is going to be a lot simpler than having to do that for all of them (which was my original plan). So, I'll have a go at that sometime. Thanks, Miklos