On Mon, Oct 21, 2019 at 06:18:48PM +0200, Jan Kara wrote: > On Mon 21-10-19 20:20:20, Matthew Bobrowski wrote: > > This patch introduces a new direct I/O write path which makes use of > > the iomap infrastructure. > > > > All direct I/O writes are now passed from the ->write_iter() callback > > through to the new direct I/O handler ext4_dio_write_iter(). This > > function is responsible for calling into the iomap infrastructure via > > iomap_dio_rw(). > > > > Code snippets from the existing direct I/O write code within > > ext4_file_write_iter() such as, checking whether the I/O request is > > unaligned asynchronous I/O, or whether the write will result in an > > overwrite have effectively been moved out and into the new direct I/O > > ->write_iter() handler. > > > > The block mapping flags that are eventually passed down to > > ext4_map_blocks() from the *_get_block_*() suite of routines have been > > taken out and introduced within ext4_iomap_alloc(). > > > > For inode extension cases, ext4_handle_inode_extension() is > > effectively the function responsible for performing such metadata > > updates. This is called after iomap_dio_rw() has returned so that we > > can safely determine whether we need to potentially truncate any > > allocated blocks that may have been prepared for this direct I/O > > write. We don't perform the inode extension, or truncate operations > > from the ->end_io() handler as we don't have the original I/O 'length' > > available there. The ->end_io() however is responsible fo converting > > allocated unwritten extents to written extents. > > > > In the instance of a short write, we fallback and complete the > > remainder of the I/O using buffered I/O via > > ext4_buffered_write_iter(). > > > > The existing buffer_head direct I/O implementation has been removed as > > it's now redundant. > > > > Signed-off-by: Matthew Bobrowski <mbobrowski@xxxxxxxxxxxxxx> > > --- > > fs/ext4/ext4.h | 3 - > > fs/ext4/extents.c | 4 +- > > fs/ext4/file.c | 236 ++++++++++++++++++-------- > > fs/ext4/inode.c | 411 +++++----------------------------------------- > > 4 files changed, 207 insertions(+), 447 deletions(-) > > The patch looks good to me! You can add: > > Reviewed-by: Jan Kara <jack@xxxxxxx> Thanks Jan! :) > One nitpick below: > > > + if (extend) { > > + ret = ext4_handle_inode_extension(inode, ret, offset, count); > > + > > + /* > > + * We may have failed to remove the inode from the orphan list > > + * in the case that the i_disksize got update due to delalloc > > + * writeback while the direct I/O was running. We need to make > > + * sure we remove it from the orphan list as if we've > > + * prematurely popped it onto the list. > > + */ > > + if (!list_empty(&EXT4_I(inode)->i_orphan)) { > > + handle = ext4_journal_start(inode, EXT4_HT_INODE, 2); > > + if (IS_ERR(handle)) { > > + ret = PTR_ERR(handle); > > + if (inode->i_nlink) > > + ext4_orphan_del(NULL, inode); > > + goto out; > > + } > > + > > + if (inode->i_nlink) > > This check can be joined with the list_empty() check above to save us from > unnecessarily starting a transaction. Yes, easy done. > Also I was wondering whether it would not make more sense have this > orphan handling bit also in > ext4_handle_inode_extension(). ext4_dax_write_iter() doesn't > strictly need it (as for DAX i_disksize cannot currently change > while ext4_dax_write_iter() is running) but it would look more > robust to me for the future users and it certainly doesn't hurt > ext4_dax_write_iter() case. I was thinking the same, but to be honest I wasn't entirely sure how it would pan out for the DAX code path. However, seeing as though you don't forsee there being any problems, then I can't really think of a reason not to roll this up into ext4_handle_inode_extension(). So, in ext4_handle_inode_extension() for the initial check against i_disksize, rather than returning 'written' and then having ext4_dio_write_iter() perform the cleanup, we could simply jump to a chunk of code in ext4_handle_inode_extension() and deal with it there, or quite literally just cleanup if that branch is taken there and then seeing as though it's not really needed in any other case? What do you think? --<M>--