Re: [PATCH 0/4] Some more lock_page work..

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Wed, 14 Oct 2020 19:44:16 -0700

On Wed, Oct 14, 2020 at 6:48 PM Qian Cai <cai@xxxxxx> wrote:
>
> While on this topic, I just want to bring up a bug report that we are chasing an
> issue that a process is stuck in the loop of wait_on_page_bit_common() for more
> than 10 minutes before I gave up.

Judging by call trace, that looks like a deadlock rather than a missed wakeup.

The trace isn't reliable, but I find it suspicious that the call trace
just before the fault contains that
"iov_iter_copy_from_user_atomic()".

IOW, I think you're in fuse_fill_write_pages(), which has allocated
the page, locked it, and then it takes a page fault.

And the page fault waits on a page that is locked.

This is a classic deadlock.

The *intent* is that iov_iter_copy_from_user_atomic() returns zero,
and you retry without the page lock held.

HOWEVER.

That's not what fuse actually does. Fuse will do multiple pages, and
it will unlock only the _last_ page. It keeps the other pages locked,
and puts them in an array:

                ap->pages[ap->num_pages] = page;

And after the iov_iter_copy_from_user_atomic() fails, it does that
"unlock" and repeat.

But while the _last_ page was unlocked, the *previous* pages are still
locked in that array. Deadlock.

I really don't think this has anything at all to do with page locking,
and everything to do with fuse_fill_write_pages() having a deadlock if
the source of data is a mmap of one of the pages it is trying to write
to (just with an offset, so that it's not the last page).

See a similar code sequence in generic_perform_write(), but notice how
that code only has *one* page that it locks, and never holds an array
of pages around over that iov_iter_fault_in_readable() thing.

              Linus