On Tue, 2017-04-18 at 10:15 +0200, Michal Hocko wrote: > On Sat 15-04-17 00:59:46, Bart Van Assche wrote: > > On Fri, 2017-04-14 at 17:40 -0700, Hugh Dickins wrote: > > > Changing a fundamental function, silently not to do its essential job, > > > when something in the kernel has forgotten (or is slow to) unlock_page(): > > > that seems very wrong to me in many ways. But linux-fsdevel, Cc'ed, will > > > be a better forum to advise on how to solve the problem you're seeing. > > > > It seems like you have misunderstood the purpose of the patch I posted. It's > > neither a missing unlock_page() nor slow I/O that I want to address but a > > genuine deadlock. In case you would not be familiar with the queue_if_no_path > > multipath configuration option, the multipath.conf man page is available at > > e.g. https://linux.die.net/man/5/multipath.conf. > > So, who is holding the page lock and why it cannot make forward > progress? Is the storage gone so that the ongoing IO will never > terminate? Btw. we have many other places which wait for the page lock > !killable way. Why they are any different from this case? Hello Michal, queue_if_no_path means that if no paths are available that the dm-mpath driver does not complete an I/O request until a path becomes available. A standard test for multipathed storage is to alternatingly remove and restore all paths. If the reported lockup happens at the end of a test I can break the cycle by running "dmsetup message ${mpath} 0 fail_if_no_path". That command causes pending I/O requests to fail if no paths are available. I think it is rather unintuitive that kill -9 does not work for a process that uses a dm-mpath device for I/O as long as no paths are available. The call stack I reported in the first e-mail in this thread is what I ran into while running multipath tests. I'm not sure why I have not yet hit any other code paths that perform an unkillable wait on a page lock. Bart.