On Thu, Sep 17, 2020 at 01:09:42PM +1000, Dave Chinner wrote: > > > iomap_dio_complete() > > > generic_write_sync() > > > btrfs_file_fsync() > > > inode_lock() > > > <deadlock> > > > > Can inode_dio_end() be called before generic_write_sync(), as it is done > > in fs/direct-io.c:dio_complete()? > > Don't think so. inode_dio_wait() is supposed to indicate that all > DIO is complete, and having the "make it stable" parts of an O_DSYNC > DIO still running after inode_dio_wait() returns means that we still > have DIO running.... > > For some filesystems, ensuring the DIO data is stable may involve > flushing other data (perhaps we did EOF zeroing before the file > extending DIO) and/or metadata to the log, so we need to guarantee > these DIO related operations are complete and stable before we say > the DIO is done. inode_dio_wait really just waits for active I/O that writes to or reads from the file. It does not imply that the I/O is stable, just like i_rwsem itself doesn't. Various file systems have historically called the syncing outside i_rwsem and inode_dio_wait (in fact that is what the fs/direct-io.c code does, so XFS did as well until a few years ago), and that isn't a problem at all - we just can't return to userspace (or call ki_complete for in-kernel users) before the data is stable on disk.