On 2024/7/15 20:33, Baokun Li wrote:
Hi Ojaswin!
On 2024/7/14 23:42, Ojaswin Mujoo wrote:
On Wed, Jul 10, 2024 at 12:06:36PM +0800, libaokun@xxxxxxxxxxxxxxx
wrote:
From: Baokun Li <libaokun1@xxxxxxxxxx>
In ext4_ext_rm_idx() and ext4_ext_correct_indexes(), there is no proper
rollback of already executed updates when updating a level of the
extents
path fails, so we may get an inconsistent extents tree, which may
trigger
some bad things in errors=continue mode.
Hence clear the verified bit of modified extents buffers if the tree
fails
to be updated in ext4_ext_rm_idx() or ext4_ext_correct_indexes(), which
forces the extents buffers to be checked in
ext4_valid_extent_entries(),
ensuring that the extents tree is consistent.
Signed-off-by: zhanchengbin <zhanchengbin1@xxxxxxxxxx>
Link:
https://lore.kernel.org/r/20230213080514.535568-3-zhanchengbin1@xxxxxxxxxx/
Signed-off-by: Baokun Li <libaokun1@xxxxxxxxxx>
---
fs/ext4/extents.c | 31 +++++++++++++++++++++++++++----
1 file changed, 27 insertions(+), 4 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index bff3666c891a..4d589d34b30e 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -1749,12 +1749,23 @@ static int ext4_ext_correct_indexes(handle_t
*handle, struct inode *inode,
break;
err = ext4_ext_get_access(handle, inode, path + k);
if (err)
- break;
+ goto clean;
path[k].p_idx->ei_block = border;
err = ext4_ext_dirty(handle, inode, path + k);
if (err)
- break;
+ goto clean;
}
+ return 0;
+
+clean:
+ /*
+ * The path[k].p_bh is either unmodified or with no verified bit
+ * set (see ext4_ext_get_access()). So just clear the verified bit
+ * of the successfully modified extents buffers, which will force
+ * these extents to be checked to avoid using inconsistent data.
+ */
+ while (++k < depth)
+ clear_buffer_verified(path[k].p_bh);
return err;
}
@@ -2312,12 +2323,24 @@ static int ext4_ext_rm_idx(handle_t *handle,
struct inode *inode,
break;
err = ext4_ext_get_access(handle, inode, path + k);
if (err)
- break;
+ goto clean;
path[k].p_idx->ei_block = path[k + 1].p_idx->ei_block;
err = ext4_ext_dirty(handle, inode, path + k);
if (err)
- break;
+ goto clean;
}
+ return 0;
+
+clean:
+ /*
+ * The path[k].p_bh is either unmodified or with no verified bit
+ * set (see ext4_ext_get_access()). So just clear the verified bit
+ * of the successfully modified extents buffers, which will force
+ * these extents to be checked to avoid using inconsistent data.
+ */
+ while (++k < depth)
+ clear_buffer_verified(path[k].p_bh);
+
return err;
}
Hi Baokun,
So I wanted to understand the extent handling paths for a whil and
thought this
patchset was a good chance to finally take sometime and do that.
I do have a question based on my understanding of this extent
deletion code:
So IIUC, ext4_find_extent() will return a path where buffer of each
node is
verified (via bh = read_extent_tree_block()). So imagine we have the
following
path (d=depth, blk=idx.ei_block, v=verified, nv=not-verified):
+------+ +------+ +------+ +------+ +------+
|d=0 | |d=1 | |d=2 | |d=3 | | |
|blk=1 | --> |blk=1 | --> |blk=1 | --> |blk=1 | --> |pblk=1|
|(v) | |(v) | |(v) | |(v) | | |
+------+ +------+ +------+ +------+ +------+
|d=3 | +------+
|blk=2 | --> | |
|(v) | |pblk=2|
+------+ | |
+------+
Here, the the last column are the leaf nodes with only 1 extent in
them. Now,
say we want to punch the leaf having pblk=1. We'll eventually call
ext4_ext_rm_leaf() -> ext4_ext_rm_idx() to remove the index at depth
= 3 and
try fixin up the indices in path with ei_block = 2
Suppose we error out at depth == 1. After the cleanup (introduced in
this
patch), we'll mark depth = 1 to 4 as non verified:
+------+ +------+ +------+ +------+ +------+
|d=0 | |d=1 | |d=2 | |d=3 | | |
|blk=1 | --> |blk=1 | --> |blk=2 | --> |blk=2 | --> |pblk=2|
|(v) | |(nv) | |(nv) | |(nv) | |(nv) |
+------+ +------+ +------+ +------+ +------+
Exactly right!
And we return and we won't actually mark the FS corrupt until we try
to check
the bh at depth = 1 above. In this case, the first node is still
pointing to
wrong ei_block but is marked valid. Aren't we silently leaving the
tree in an
inconsistent state which might lead to corrupted lookups until we
actually
touch the affected bh and realise that there's a corruption.
Am I missing some codepath here? Should we maybe try to clear_valid
for the
whole tree?
(I hope the formatting of diagram comes out correct :) )
Uh, I'm sorry, my diagram is disordered. 😅
Regards,
ojaswin
But the journal will ensure the consistency of the extents path after
this patch.
When ext4_ext_get_access() or ext4_ext_dirty() returns an error in
ext4_ext_rm_idx() and ext4_ext_correct_indexes(), this may cause
the extents tree to be inconsistent. But the inconsistency just
exists in memory and doesn't land on disk.
For ext4_ext_get_access(), the handle must have been aborted
when it returned an error, as follows:
ext4_ext_get_access
ext4_journal_get_write_access
__ext4_journal_get_write_access
err = jbd2_journal_get_write_access
if (err)
ext4_journal_abort_handle
For ext4_ext_dirty(), since path->p_bh must not be null and handle
must be valid, handle is aborted anyway when an error is returned:
ext4_ext_dirty
__ext4_ext_dirty
if (path->p_bh)
__ext4_handle_dirty_metadata
if (ext4_handle_valid(handle))
err = jbd2_journal_dirty_metadata
if (!is_handle_aborted(handle) && WARN_ON_ONCE(err))
ext4_journal_abort_handle
Thus the extents tree will only be inconsistent in memory, so only
the verified bit of the modified buffer needs to be cleared to avoid
these inconsistent data being used in memory.
Regards,
Baokun