Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 17 Sep 2020 13:09:42 +1000

On Tue, Sep 15, 2020 at 04:48:53PM -0500, Goldwyn Rodrigues wrote:
> On 10:04 07/09, Dave Chinner wrote:
> > On Thu, Sep 03, 2020 at 06:32:36PM +0200, Christoph Hellwig wrote:
> > > We could trivially do something like this to allow the file system
> > > to call iomap_dio_complete without i_rwsem:
> > 
> > That just exposes another deadlock vector:
> > 
> > P0			P1
> > inode_lock()		fallocate(FALLOC_FL_ZERO_RANGE)
> > __iomap_dio_rw()	inode_lock()
> > 			<block>
> > <submits IO>
> > <completes IO>
> > inode_unlock()
> > 			<gets inode_lock()>
> > 			inode_dio_wait()
> > iomap_dio_complete()
> >   generic_write_sync()
> >     btrfs_file_fsync()
> >       inode_lock()
> >       <deadlock>
> 
> Can inode_dio_end() be called before generic_write_sync(), as it is done
> in fs/direct-io.c:dio_complete()?

Don't think so.  inode_dio_wait() is supposed to indicate that all
DIO is complete, and having the "make it stable" parts of an O_DSYNC
DIO still running after inode_dio_wait() returns means that we still
have DIO running....

For some filesystems, ensuring the DIO data is stable may involve
flushing other data (perhaps we did EOF zeroing before the file
extending DIO) and/or metadata to the log, so we need to guarantee
these DIO related operations are complete and stable before we say
the DIO is done.

> Christoph's solution is a clean approach and would prefer to use it as
> the final solution.

/me shrugs

Christoph's solution simply means you can't use inode_dio_wait() in
the filesystem. btrfs would need its own DIO barrier....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx