On Wed, Oct 14, 2020 at 6:48 PM Qian Cai <cai@xxxxxx> wrote: > > While on this topic, I just want to bring up a bug report that we are chasing an > issue that a process is stuck in the loop of wait_on_page_bit_common() for more > than 10 minutes before I gave up. Judging by call trace, that looks like a deadlock rather than a missed wakeup. The trace isn't reliable, but I find it suspicious that the call trace just before the fault contains that "iov_iter_copy_from_user_atomic()". IOW, I think you're in fuse_fill_write_pages(), which has allocated the page, locked it, and then it takes a page fault. And the page fault waits on a page that is locked. This is a classic deadlock. The *intent* is that iov_iter_copy_from_user_atomic() returns zero, and you retry without the page lock held. HOWEVER. That's not what fuse actually does. Fuse will do multiple pages, and it will unlock only the _last_ page. It keeps the other pages locked, and puts them in an array: ap->pages[ap->num_pages] = page; And after the iov_iter_copy_from_user_atomic() fails, it does that "unlock" and repeat. But while the _last_ page was unlocked, the *previous* pages are still locked in that array. Deadlock. I really don't think this has anything at all to do with page locking, and everything to do with fuse_fill_write_pages() having a deadlock if the source of data is a mmap of one of the pages it is trying to write to (just with an offset, so that it's not the last page). See a similar code sequence in generic_perform_write(), but notice how that code only has *one* page that it locks, and never holds an array of pages around over that iov_iter_fault_in_readable() thing. Linus