Re: Possible deadlock in fuse write path (Was: Re: [PATCH 0/4] Some more lock_page work..)

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Thu, 15 Oct 2020 14:21:58 -0700

On Thu, Oct 15, 2020 at 12:55 PM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
>
> I am wondering how should I fix this issue. Is it enough that I drop
> the page lock (but keep the reference) inside the loop. And once copying
> from user space is done, acquire page locks for all pages (Attached
> a patch below).

What is the page lock supposed to protect?

Because whatever it protects, dropping the lock drops, and you'd need
to re-check whatever the page lock was there for.

> Or dropping page lock means that there are no guarantees that this
> page did not get written back and removed from address space and
> a new page has been placed at same offset. Does that mean I should
> instead be looking up page cache again after copying from user
> space is done.

I don't know why fuse does multiple pages to begin with. Why can't it
do whatever it does just one page at a time?

But yes, you probably should look the page up again whenever you've
unlocked it, because it might have been truncated or whatever.

Not that this is purely about unlocking the page, not about "after
copying from user space". The iov_iter_copy_from_user_atomic() part is
safe - if that takes a page fault, it will just do a partial copy, it
won't deadlock.

So you can potentially do multiple pages, and keep them all locked,
but only as long as the copies are all done with that
"from_user_atomic()" case. Which normally works fine, since normal
users will write stuff that they just generated, so it will all be
there.

It's only when that returns zero, and you do the fallback to pre-fault
in any data with iov_iter_fault_in_readable() that you need to unlock
_all_ pages (and once you do that, I don't see what possible advantage
the multi-page array can have).

Of course, the way that code is written, it always does the
iov_iter_fault_in_readable() for each page - it's not written like
some kind of "special case fallback thing".

I suspect the code was copied from the generic write code, but without
understanding why the generic write code was ok.

               Linus