On Tue, Jul 21, 2015 at 12:05:56AM -0400, Ming Lei wrote: > On Mon, Jul 20, 2015 at 9:59 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > Hi Ming, > > > > With the recent merge of the loop device changes, I'm now seeing > > XFS deadlock on my single CPU, 1GB RAM VM running xfs/073. > > > > The deadlocked is as follows: > > > > kloopd1: loop_queue_read_work > > xfs_file_iter_read > > lock XFS inode XFS_IOLOCK_SHARED (on image file) > > page cache read (GFP_KERNEL) > > radix tree alloc > > memory reclaim > > reclaim XFS inodes > > log force to unpin inodes > > <wait for log IO completion> > > > > xfs-cil/loop1: <does log force IO work> > > xlog_cil_push > > xlog_write > > <loop issuing log writes> > > xlog_state_get_iclog_space() > > <blocks due to all log buffers under write io> > > <waits for IO completion> > > > > kloopd1: loop_queue_write_work > > xfs_file_write_iter > > lock XFS inode XFS_IOLOCK_EXCL (on image file) > > <wait for inode to be unlocked> > > > > [The full stack traces are below]. > > > > i.e. the kloopd, with it's split read and write work queues, has > > introduced a dependency through memory reclaim. i.e. that writes > > need to be able to progress for reads make progress. > > This kind of change just makes READ vs READ OR WRITE submitted > to fs concurrently, and the use case should have been simulated from > user space on one regular XFS file too? Assuming the "regular XFS file" is on a normal block device (i.e. not a loop device) then this will not deadlock as there is not dependency on vfs level locking for log writes. i.e. normal userspace IO path is: userspace read vfs_read xfs_read page cache alloc (GFP_KERNEL) direct reclaim xfs_inode reclaim log force CIL push <workqueue> xlog_write submit_bio -> hardware. And then the log IO completes, and everything continues onward. What the loop device used to do: userspace read vfs_read xfs_read page cache alloc (GFP_KERNEL) submit_bio loop device splice_read (on image file) xfs_splice_read page cache alloc (GFP_NOFS) direct reclaim <skip filesystem reclaim> submit_bio -> hardware. And when the read Io completes, everything moves onwards. What the loop device now does: userspace read vfs_read xfs_read page cache alloc (GFP_KERNEL) submit_bio loop device <workqueue> vfs_read (on image file) xfs_read page cache alloc (GFP_KERNEL) direct reclaim xfs_inode reclaim log force CIL push <workqueue> xlog_write submit_bio loop device <workqueue> vfs_write (on image file) xfs_write <deadlock on image file lock> > > The problem, fundamentally, is that mpage_readpages() does a > > GFP_KERNEL allocation, rather than paying attention to the inode's > > mapping gfp mask, which is set to GFP_NOFS. > > That looks the root cause, and I guess the issue is just triggered > after commit aa4d86163e4(block: loop: switch to VFS ITER_BVEC) > which changes splice to bvec iterator. Yup - you are the unfortunate person who has wandered into the minefield I'd been telling people about for quite some time. :( Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>