On Mon, Oct 21, 2019 at 09:43:30PM +0200, Jan Kara wrote: > On Mon 21-10-19 09:31:12, Theodore Y. Ts'o wrote: > > Hi Matthew, thanks for your work on this patch series! > > > > I applied it against 4c3, and ran a quick test run on it, and found > > the following locking problem. To reproduce: > > > > kvm-xfstests -c nojournal generic/113 > > > > generic/113 [09:27:19][ 5.841937] run fstests generic/113 at 2019-10-21 09:27:19 > > [ 7.959477] > > [ 7.959798] ============================================ > > [ 7.960518] WARNING: possible recursive locking detected > > [ 7.961225] 5.4.0-rc3-xfstests-00012-g7fe6ea084e48 #1238 Not tainted > > [ 7.961991] -------------------------------------------- > > [ 7.962569] aio-stress/1516 is trying to acquire lock: > > [ 7.963129] ffff9fd4791148c8 (&sb->s_type->i_mutex_key#12){++++}, at: __generic_file_fsync+0x3e/0xb0 > > [ 7.964109] > > [ 7.964109] but task is already holding lock: > > [ 7.964740] ffff9fd4791148c8 (&sb->s_type->i_mutex_key#12){++++}, at: ext4_dio_write_iter+0x15b/0x430 > > This is going to be a tricky one. With iomap, the inode locking is handled > by the filesystem while calling generic_write_sync() is done by > iomap_dio_rw(). I would really prefer to avoid tweaking iomap_dio_rw() not > to call generic_write_sync(). You can't remove it from there, because that will break O_DSYNC AIO+DIO. i.e. generic_write_sync() needs to be called before iocb->ki_complete() is called in the AIO completion path, and that means filesystems using iomap_dio_rw need to be be able to run generic_write_sync() without taking the inode_lock(). > So we need to remove inode_lock from > __generic_file_fsync() (used from ext4_sync_file()). This locking is mostly > for legacy purposes and we don't need this in ext4 AFAICT - but removing > the lock from __generic_file_fsync() would mean auditing all legacy > filesystems that use this to make sure flushing inode & its metadata buffer > list while it is possibly changing cannot result in something unexpected. I > don't want to clutter this series with it so we are left with > reimplementing __generic_file_fsync() inside ext4 without inode_lock. Not > too bad but not great either. Thoughts? XFS has implemented it's own ->fsync operation pretty much forever without issues. It's basically: 1. flush dirty cached data w/ WB_SYNC_ALL 2. flush dirty cached metadata (i.e. journal force) 3. flush device caches if journal force didn't, keeping in mind the requirements of data and journal being placed on different devices. The ext4 variant shouldn't need to be any more complex than that... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx