On 2023/6/12 11:04, Theodore Ts'o wrote:
On Sat, Jun 10, 2023 at 03:03:19PM -0400, Theodore Ts'o wrote:
Unforuntately, the changes to ext4_insert_delayed_block() in this
patch were buggy, and were causing tests to hang when running
ext4/encrypt, ext4/bigalloc_4k, and ext4/bigalloc_1k test scenarios.
A bisect using "gce-xfstests -c ext4/bigalloc_4k -C 5 generic/579"
pinpointed the problem.
I'm very sorry, I didn't turn on encrypt or bigalloc when I tested it.
The problem is that ext4_clu_mapped can return a positive value, and
so there are times when we do need to release the space even though
there are no errors.
Yes, ext4_clu_mapped may return a positive value,
but when it does, reserved is false and we never need to release the space.
So I've fixed up your commit with the following changes. With this
change, the test regressions go away.
The previous reply was very confusing to me because the changes
in the previous reply have nothing to do with ext4_clu_mapped
and ret is always 0 when reserved is true, so we don't need
ext4_da_release_space to perform a rollback.
It turns out my fix was not correct, because I misread the fundamental
problem with the patch. The issue was in the last patch hunk:
- ret = ext4_es_insert_delayed_block(inode, lblk, allocated);
- if (ret && reserved)
- ext4_da_release_space(inode, 1);
-
+ ext4_es_insert_delayed_block(inode, lblk, allocated);
errout:
return ret;
}
Indeed, there is a behavioral change in ret here.
Before modification:
ext4_da_map_blocks --> return 0
ext4_insert_delayed_block --> return 0
ext4_clu_mapped --> return 1
ext4_es_insert_delayed_block --> return 0
After modification:
ext4_da_map_blocks --> return 1
ext4_insert_delayed_block --> return 1
ext4_clu_mapped --> return 1
ext4_es_insert_delayed_block --> void
The problem is that entering this code hunk, ret could be non-zero.
But when we made ext4_es_insert_delayed_block() to return void. So
the changes to fs/ext4/inode.c needed to be replaced by this:
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 129b9af53d62..7700db1782dd 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1630,7 +1630,6 @@ static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk)
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
int ret;
bool allocated = false;
- bool reserved = false;
/*
* If the cluster containing lblk is shared with a delayed,
@@ -1646,8 +1645,7 @@ static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk)
if (sbi->s_cluster_ratio == 1) {
ret = ext4_da_reserve_space(inode);
if (ret != 0) /* ENOSPC */
- goto errout;
- reserved = true;
+ return ret;
} else { /* bigalloc */
if (!ext4_es_scan_clu(inode, &ext4_es_is_delonly, lblk)) {
if (!ext4_es_scan_clu(inode,
@@ -1655,12 +1653,11 @@ static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk)
ret = ext4_clu_mapped(inode,
EXT4_B2C(sbi, lblk));
if (ret < 0)
- goto errout;
+ return ret;
if (ret == 0) {
ret = ext4_da_reserve_space(inode);
if (ret != 0) /* ENOSPC */
- goto errout;
- reserved = true;
+ return ret;
} else {
allocated = true;
}
@@ -1670,12 +1667,8 @@ static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk)
}
}
- ret = ext4_es_insert_delayed_block(inode, lblk, allocated);
- if (ret && reserved)
- ext4_da_release_space(inode, 1);
-
-errout:
- return ret;
+ ext4_es_insert_delayed_block(inode, lblk, allocated);
+ return 0;
}
/*
- Ted
Yes, it looks very good!A million thanks for the fix!
I am very sorry for taking your time to locate and fix this issue!
I will do more checks later.
--
With Best Regards,
Baokun Li
.