Re: [PATCH v3 0/2] iov_iter: allow iov_iter_get_pages_alloc to allocate more pages per call

Miklos Szeredi <miklos@xxxxxxxxxx> · Mon, 6 Feb 2017 15:18:42 +0100

On Mon, Feb 6, 2017 at 10:57 AM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> On Mon, Feb 06, 2017 at 10:08:06AM +0100, Miklos Szeredi wrote:
>
>> Yes, I think only page lock can be used to deadlock inside
>> fuse_dev_read/write().  So requests that don't have locked pages
>> should be okay  with just waiting until copy_to/from_user() finishes
>> and only then proceeding with the abort.
>
> Actually, looking at that some more, this might be not true.  Anything
> that takes ->mmap_sem exclusive and *not* killable makes for another
> source of deadlock.
>
> Initial page fault takes ->mmap_sem shared.  OK, request sent to
> server and server tries to read() it.  In the meanwhile, something
> has closed userfaultfd for the same mm_struct.  We have userfaultfd_release()
> block on attempt to take ->mmap_sem exclusive and from now on any attempt
> to grab ->mmap_sem shared will deadlock.  And get_user_pages(), as well
> as copy_to_user(), etc. can end up doing just that.  It doesn't have to
> be an mmap of the same file, BTW - any page fault would do.
>
> All you really need is to have server sharing address space with the
> process that steps into original page fault, plus an evicted page
> of any nature (anon mmap, whatever) being used as a destination of
> read() in server.
>
> down_read() inside down_read() is fine, unless there had been down_write()
> in between.  And there are unkillable down_write() on ->mmap_sem -
> userfaultfd_release() being one example of such.  Many of those can and
> probably should become down_write_killable(), but this one can't - there
> might be nothing to deliver the signal to, if the final close() happens
> e.g. from exit(2).
>
> Warning: the above might be completely bogus - I'm on way too large
> uptime at the moment and most of the last day had been spent digging
> through various convoluted code, so take the above with a cartload of
> salt.  _If_ it's true, that kind of deadlock won't be possible to
> break with killing anything or doing umount -f, though.

It's not bogus, the deadlock is there.

But I think it's breakable in the same way: if the deadlocked request
is aborted, the fault will release the page lock as well as mmap_sem,
and from there things will resolve themselves.

But you are definitely right about needing to clean up that mess in
fuse/dev.c and doing so by fixing up the arg refcounting for just the
read and write requests is going to be a lot simpler than having to do
that for all of them (which was my original plan).

So, I'll have a go at that sometime.

Thanks,
Miklos