On Tue, Jun 18, 2019 at 04:47:16PM +0200, Andreas Gruenbacher wrote: > Remove the mark_inode_dirty call from __generic_write_end and add it to > generic_write_end and the high-level iomap functions where necessary. > That way, large writes will only dirty inodes at the end instead of > dirtying them once per page. This fixes a performance bottleneck on > gfs2. > > Signed-off-by: Andreas Gruenbacher <agruenba@xxxxxxxxxx> > --- > fs/buffer.c | 26 ++++++++++++++++++-------- > fs/iomap.c | 42 ++++++++++++++++++++++++++++++++++++++---- > 2 files changed, 56 insertions(+), 12 deletions(-) .... > > diff --git a/fs/iomap.c b/fs/iomap.c > index 23ef63fd1669..9454568a7f5e 100644 > --- a/fs/iomap.c > +++ b/fs/iomap.c > @@ -881,6 +881,13 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *iter, > { > struct inode *inode = iocb->ki_filp->f_mapping->host; > loff_t pos = iocb->ki_pos, ret = 0, written = 0; > + loff_t old_size; > + > + /* > + * No need to use i_size_read() here, the i_size cannot change under us > + * because we hold i_rwsem. > + */ > + old_size = inode->i_size; > > while (iov_iter_count(iter)) { > ret = iomap_apply(inode, pos, iov_iter_count(iter), > @@ -891,6 +898,9 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *iter, > written += ret; > } > > + if (old_size != inode->i_size) > + mark_inode_dirty(inode); > + > return written ? written : ret; > } > EXPORT_SYMBOL_GPL(iomap_file_buffered_write); > @@ -961,18 +971,30 @@ int > iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len, > const struct iomap_ops *ops) > { > + loff_t old_size; > loff_t ret; > > + /* > + * No need to use i_size_read() here, the i_size cannot change under us > + * because we hold i_rwsem. > + */ > + old_size = inode->i_size; > + > while (len) { > ret = iomap_apply(inode, pos, len, IOMAP_WRITE, ops, NULL, > iomap_dirty_actor); > if (ret <= 0) > - return ret; > + goto out; > pos += ret; > len -= ret; > } > + ret = 0; > > - return 0; > +out: > + if (old_size != inode->i_size) > + mark_inode_dirty(inode); I don't think we want to do this. The patches I have that add range locking for XFS allow buffered writes to run concurrently with operations that change the inode size as long as the ranges don't overlap. To do this, XFS will not hold the i_rwsem over any iomap call it makes in future - it will hold a range lock instead. Hence we can have writes and other IO operations occurring at the same time some other operation is changing the size of the file, and that means this code no longer does what you are intending it to do because the inode->i_size is no longer constant across these operations... Hence I think adding code that depends on i_rwsem to be held to function correctly is the wrong direction to be taking the iomap infrastructure. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx