在 2024/8/28 1:08, Jan Kara 写道:
On Tue 20-08-24 22:06:57, yangerkun wrote:
From: yangerkun <yangerkun@xxxxxxxxxx>
Thanks for debugging this. Couple of spelling fixes first:
Hi,
Thank you for your patient review!
Any extended write for ext4 requires the inode to be placed on the
^^^ extending
orphan list before the actual write. In addition, the inode can be
actually removed from the orphan list only after all writes are
completed. Otherwise, those overcommitted blocks (If the allocated
^^ I'd phrase this: Otherwise we'd leave allocated blocks
beyond i_disksize if we could not copy all the data into allocated block
and e2fsck would complain.
blocks are not written due to certain reasons, the inode size does not
exceed the offset of these blocks) The leak status is always retained,
and fsck reports an alarm for this scenario.
Currently, the dio and buffer IO comply with this logic. However, the
^^ buffered
BTW: The only reason why direct IO doesn't have this problem is because
we don't do short writes for direct IO. We either submit all or we return
error.
Yeah. In fact, the first version in my mind is same as this, don't do
short write for dax too. But thinking deeper, it seems better to keep
the blocks that has been successfully written...
dax write will removed the inode from orphan list since
^^^ remove ^ the orphan ...
ext4_handle_inode_extension is unconditionally called during extend
^^ extending
write. Fix it with this patch. We open the code from
ext4_handle_inode_extension since we want to keep the blocks valid
has been allocated and write success.
Signed-off-by: yangerkun <yangerkun@xxxxxxxxxx>
---
fs/ext4/file.c | 35 +++++++++++++++++++++++++++++++----
1 file changed, 31 insertions(+), 4 deletions(-)
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index be061bb64067..fd8597eef75e 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -628,11 +628,12 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
static ssize_t
ext4_dax_write_iter(struct kiocb *iocb, struct iov_iter *from)
{
- ssize_t ret;
+ ssize_t ret, written;
size_t count;
loff_t offset;
handle_t *handle;
bool extend = false;
+ bool need_trunc = true;
struct inode *inode = file_inode(iocb->ki_filp);
if (iocb->ki_flags & IOCB_NOWAIT) {
@@ -668,10 +669,36 @@ ext4_dax_write_iter(struct kiocb *iocb, struct iov_iter *from)
ret = dax_iomap_rw(iocb, from, &ext4_iomap_ops);
- if (extend) {
- ret = ext4_handle_inode_extension(inode, offset, ret);
- ext4_inode_extension_cleanup(inode, ret < (ssize_t)count);
+ if (!extend)
+ goto out;
+
+ if (ret <= 0)
+ goto err_trunc;
+
+ written = ret;
+ handle = ext4_journal_start(inode, EXT4_HT_INODE, 2);
+ if (IS_ERR(handle)) {
+ ret = PTR_ERR(handle);
+ goto err_trunc;
}
+
+ if (ext4_update_inode_size(inode, offset + written)) {
+ ret = ext4_mark_inode_dirty(handle, inode);
+ if (unlikely(ret)) {
+ ext4_journal_stop(handle);
+ goto err_trunc;
+ }
+ }
+
+ if (written == count)
+ need_trunc = false;
+
+ if (inode->i_nlink)
+ ext4_orphan_del(handle, inode);
Why did you keep ext4_orphan_del() here? I thought the whole point of this
Sorry, I make a mistake here, there should be a truncate before. Thanks
for point out this!
patch is to avoid it? In fact, rather then opencoding
ext4_handle_inode_extension() I'd add argument to
ext4_handle_inode_extension() like:
ext4_handle_inode_extension(inode, pos, written, allocated)
and remove inode from the orphan list only if written == allocated. The
call site in ext4_dio_write_end_io() would call:
/*
* For DIO we don't do partial writes so we must have submitted all
* that was allocated.
*/
return ext4_handle_inode_extension(inode, pos, size, size);
and the call site in ext4_dax_write_iter() would call:
ret = ext4_handle_inode_extension(inode, offset, ret, count);
What do you think?
Great! This seems more clearly and I think it should works too. Whould I
send a v2 patch for this?
Thanks,
Erkun.
Honza