On Tue, 28 Feb 2023, Huang, Ying wrote: > Jan Kara <jack@xxxxxxx> writes: > > On Fri 17-02-23 13:47:48, Hugh Dickins wrote: > >> > >> Cc'ing Jan Kara, who knows buffer_migrate_folio_norefs() and jbd2 > >> very well, and I hope can assure us that there is an understandable > >> deadlock here, from holding several random folio locks, then trying > >> to lock buffers. Cc'ing fsdevel, because there's a risk that mm > >> folk think something is safe, when it's not sufficient to cope with > >> the diversity of filesystems. I hope nothing more than the below is > >> needed (and I've had no other problems with the patchset: good job), > >> but cannot be sure. > > > > I suspect it can indeed be caused by the presence of the loop device as > > Huang Ying has suggested. What filesystems using buffer_heads do is a > > pattern like: > > > > bh = page_buffers(loop device page cache page); > > lock_buffer(bh); > > submit_bh(bh); > > - now on loop dev this ends up doing: > > lo_write_bvec() > > vfs_iter_write() > > ... > > folio_lock(backing file folio); > > > > So if migration code holds "backing file folio" lock and at the same time > > waits for 'bh' lock (while trying to migrate loop device page cache page), it > > is a deadlock. > > > > Proposed solution of never waiting for locks in batched mode looks like a > > sensible one to me... > > Thank you very much for detail explanation! Yes, thanks a lot, Jan, for elucidating the deadlocks. I was running with the 1/3,2/3,3/3 series for 48 hours on two machines at the weekend, and hit no problems with all of them on. I did try to review them this evening, but there's too much for me to take in there to give an Acked-by: but I'll ask a couple of questions. Hugh