Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context

Christoph Hellwig <hch@xxxxxx> · Thu, 17 Sep 2020 07:52:32 +0200

On Thu, Sep 17, 2020 at 01:09:42PM +1000, Dave Chinner wrote:
> > > iomap_dio_complete()
> > >   generic_write_sync()
> > >     btrfs_file_fsync()
> > >       inode_lock()
> > >       <deadlock>
> > 
> > Can inode_dio_end() be called before generic_write_sync(), as it is done
> > in fs/direct-io.c:dio_complete()?
> 
> Don't think so.  inode_dio_wait() is supposed to indicate that all
> DIO is complete, and having the "make it stable" parts of an O_DSYNC
> DIO still running after inode_dio_wait() returns means that we still
> have DIO running....
> 
> For some filesystems, ensuring the DIO data is stable may involve
> flushing other data (perhaps we did EOF zeroing before the file
> extending DIO) and/or metadata to the log, so we need to guarantee
> these DIO related operations are complete and stable before we say
> the DIO is done.

inode_dio_wait really just waits for active I/O that writes to or reads
from the file.  It does not imply that the I/O is stable, just like
i_rwsem itself doesn't.  Various file systems have historically called
the syncing outside i_rwsem and inode_dio_wait (in fact that is what the
fs/direct-io.c code does, so XFS did as well until a few years ago), and
that isn't a problem at all - we just can't return to userspace (or call
ki_complete for in-kernel users) before the data is stable on disk.