Re: [PATCH v5 00/12] ext4: port direct I/O to iomap infrastructure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 21, 2019 at 09:43:30PM +0200, Jan Kara wrote:
> On Mon 21-10-19 09:31:12, Theodore Y. Ts'o wrote:
> > Hi Matthew, thanks for your work on this patch series!
> > 
> > I applied it against 4c3, and ran a quick test run on it, and found
> > the following locking problem.  To reproduce:
> > 
> > kvm-xfstests -c nojournal generic/113
> > 
> > generic/113		[09:27:19][    5.841937] run fstests generic/113 at 2019-10-21 09:27:19
> > [    7.959477] 
> > [    7.959798] ============================================
> > [    7.960518] WARNING: possible recursive locking detected
> > [    7.961225] 5.4.0-rc3-xfstests-00012-g7fe6ea084e48 #1238 Not tainted
> > [    7.961991] --------------------------------------------
> > [    7.962569] aio-stress/1516 is trying to acquire lock:
> > [    7.963129] ffff9fd4791148c8 (&sb->s_type->i_mutex_key#12){++++}, at: __generic_file_fsync+0x3e/0xb0
> > [    7.964109] 
> > [    7.964109] but task is already holding lock:
> > [    7.964740] ffff9fd4791148c8 (&sb->s_type->i_mutex_key#12){++++}, at: ext4_dio_write_iter+0x15b/0x430
> 
> This is going to be a tricky one. With iomap, the inode locking is handled
> by the filesystem while calling generic_write_sync() is done by
> iomap_dio_rw(). I would really prefer to avoid tweaking iomap_dio_rw() not
> to call generic_write_sync().

You can't remove it from there, because that will break O_DSYNC
AIO+DIO. i.e. generic_write_sync() needs to be called before
iocb->ki_complete() is called in the AIO completion path, and that
means filesystems using iomap_dio_rw need to be be able to run
generic_write_sync() without taking the inode_lock().

> So we need to remove inode_lock from
> __generic_file_fsync() (used from ext4_sync_file()). This locking is mostly
> for legacy purposes and we don't need this in ext4 AFAICT - but removing
> the lock from __generic_file_fsync() would mean auditing all legacy
> filesystems that use this to make sure flushing inode & its metadata buffer
> list while it is possibly changing cannot result in something unexpected. I
> don't want to clutter this series with it so we are left with
> reimplementing __generic_file_fsync() inside ext4 without inode_lock. Not
> too bad but not great either. Thoughts?

XFS has implemented it's own ->fsync operation pretty much forever
without issues. It's basically:

	1. flush dirty cached data w/ WB_SYNC_ALL
	2. flush dirty cached metadata (i.e. journal force)
	3. flush device caches if journal force didn't, keeping in
	mind the requirements of data and journal being placed on
	different devices.

The ext4 variant shouldn't need to be any more complex than that...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux