On Tue 20-08-24 22:06:57, yangerkun wrote: > From: yangerkun <yangerkun@xxxxxxxxxx> Thanks for debugging this. Couple of spelling fixes first: > Any extended write for ext4 requires the inode to be placed on the ^^^ extending > orphan list before the actual write. In addition, the inode can be > actually removed from the orphan list only after all writes are > completed. Otherwise, those overcommitted blocks (If the allocated ^^ I'd phrase this: Otherwise we'd leave allocated blocks beyond i_disksize if we could not copy all the data into allocated block and e2fsck would complain. > blocks are not written due to certain reasons, the inode size does not > exceed the offset of these blocks) The leak status is always retained, > and fsck reports an alarm for this scenario. > > Currently, the dio and buffer IO comply with this logic. However, the ^^ buffered BTW: The only reason why direct IO doesn't have this problem is because we don't do short writes for direct IO. We either submit all or we return error. > dax write will removed the inode from orphan list since ^^^ remove ^ the orphan ... > ext4_handle_inode_extension is unconditionally called during extend ^^ extending > write. Fix it with this patch. We open the code from > ext4_handle_inode_extension since we want to keep the blocks valid > has been allocated and write success. > > Signed-off-by: yangerkun <yangerkun@xxxxxxxxxx> > --- > fs/ext4/file.c | 35 +++++++++++++++++++++++++++++++---- > 1 file changed, 31 insertions(+), 4 deletions(-) > > diff --git a/fs/ext4/file.c b/fs/ext4/file.c > index be061bb64067..fd8597eef75e 100644 > --- a/fs/ext4/file.c > +++ b/fs/ext4/file.c > @@ -628,11 +628,12 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from) > static ssize_t > ext4_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) > { > - ssize_t ret; > + ssize_t ret, written; > size_t count; > loff_t offset; > handle_t *handle; > bool extend = false; > + bool need_trunc = true; > struct inode *inode = file_inode(iocb->ki_filp); > > if (iocb->ki_flags & IOCB_NOWAIT) { > @@ -668,10 +669,36 @@ ext4_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) > > ret = dax_iomap_rw(iocb, from, &ext4_iomap_ops); > > - if (extend) { > - ret = ext4_handle_inode_extension(inode, offset, ret); > - ext4_inode_extension_cleanup(inode, ret < (ssize_t)count); > + if (!extend) > + goto out; > + > + if (ret <= 0) > + goto err_trunc; > + > + written = ret; > + handle = ext4_journal_start(inode, EXT4_HT_INODE, 2); > + if (IS_ERR(handle)) { > + ret = PTR_ERR(handle); > + goto err_trunc; > } > + > + if (ext4_update_inode_size(inode, offset + written)) { > + ret = ext4_mark_inode_dirty(handle, inode); > + if (unlikely(ret)) { > + ext4_journal_stop(handle); > + goto err_trunc; > + } > + } > + > + if (written == count) > + need_trunc = false; > + > + if (inode->i_nlink) > + ext4_orphan_del(handle, inode); Why did you keep ext4_orphan_del() here? I thought the whole point of this patch is to avoid it? In fact, rather then opencoding ext4_handle_inode_extension() I'd add argument to ext4_handle_inode_extension() like: ext4_handle_inode_extension(inode, pos, written, allocated) and remove inode from the orphan list only if written == allocated. The call site in ext4_dio_write_end_io() would call: /* * For DIO we don't do partial writes so we must have submitted all * that was allocated. */ return ext4_handle_inode_extension(inode, pos, size, size); and the call site in ext4_dax_write_iter() would call: ret = ext4_handle_inode_extension(inode, offset, ret, count); What do you think? Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR