On 9:31 22/09, Darrick J. Wong wrote: > On Tue, Sep 22, 2020 at 10:20:11AM -0400, Josef Bacik wrote: > > On 9/21/20 10:43 AM, Goldwyn Rodrigues wrote: > > > From: Goldwyn Rodrigues <rgoldwyn@xxxxxxxx> > > > > > > iomap complete routine can deadlock with btrfs_fallocate because of the > > > call to generic_write_sync(). > > > > > > P0 P1 > > > inode_lock() fallocate(FALLOC_FL_ZERO_RANGE) > > > __iomap_dio_rw() inode_lock() > > > <block> > > > <submits IO> > > > <completes IO> > > > inode_unlock() > > > <gets inode_lock()> > > > inode_dio_wait() > > > iomap_dio_complete() > > > generic_write_sync() > > > btrfs_file_fsync() > > > inode_lock() > > > <deadlock> > > > > > > inode_dio_end() is used to notify the end of DIO data in order > > > to synchronize with truncate. Call inode_dio_end() before calling > > > generic_write_sync(), so filesystems can lock i_rwsem during a sync. > > > > > > --- > > > fs/iomap/direct-io.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c > > > index d970c6bbbe11..e01f81e7b76f 100644 > > > --- a/fs/iomap/direct-io.c > > > +++ b/fs/iomap/direct-io.c > > > @@ -118,6 +118,7 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio) > > > dio_warn_stale_pagecache(iocb->ki_filp); > > > } > > > + inode_dio_end(file_inode(iocb->ki_filp)); > > > /* > > > * If this is a DSYNC write, make sure we push it to stable storage now > > > * that we've written data. > > > @@ -125,7 +126,6 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio) > > > if (ret > 0 && (dio->flags & IOMAP_DIO_NEED_SYNC)) > > > ret = generic_write_sync(iocb, ret); > > > - inode_dio_end(file_inode(iocb->ki_filp)); > > > kfree(dio); > > > return ret; > > > > > > > Did you verify that xfs or ext4 don't rely on the inode_dio_end() happening > > before the generic_write_sync()? I wouldn't expect that they would, but > > we've already run into problems making those kind of assumptions. If it's > > fine you can add > > I was gonna ask the same question, but as there's no SoB on this patch I > hadn't really looked at it yet. ;) > > Operations that rely on inode_dio_wait to have blocked until all the > directios are complete could get tripped up by iomap not having done the > generic_write_sync to stabilise the metadata, but I /think/ most > operations that do that also themselves flush the file. But I don't > really know if there's a subtlety there if the inode_dio_wait thread > manages to grab the ILOCK before the generic_write_sync thread does. > I ran xfstests and it was successful. I am testing ext4 now. -- Goldwyn