Re: [PATCH 1/2] ext4: fix bug_on ext4_mb_use_inode_pa

Lukas Czerner <lczerner@xxxxxxxxxx> · Mon, 23 May 2022 11:58:03 +0200

On Sat, May 21, 2022 at 09:42:16PM +0800, Baokun Li wrote:
> Hulk Robot reported a BUG_ON:
> ==================================================================
> kernel BUG at fs/ext4/mballoc.c:3211!
> [...]
> RIP: 0010:ext4_mb_mark_diskspace_used.cold+0x85/0x136f
> [...]
> Call Trace:
>  ext4_mb_new_blocks+0x9df/0x5d30
>  ext4_ext_map_blocks+0x1803/0x4d80
>  ext4_map_blocks+0x3a4/0x1a10
>  ext4_writepages+0x126d/0x2c30
>  do_writepages+0x7f/0x1b0
>  __filemap_fdatawrite_range+0x285/0x3b0
>  file_write_and_wait_range+0xb1/0x140
>  ext4_sync_file+0x1aa/0xca0
>  vfs_fsync_range+0xfb/0x260
>  do_fsync+0x48/0xa0
> [...]
> ==================================================================
> 
> Above issue may happen as follows:
> -------------------------------------
> do_fsync
>  vfs_fsync_range
>   ext4_sync_file
>    file_write_and_wait_range
>     __filemap_fdatawrite_range
>      do_writepages
>       ext4_writepages
>        mpage_map_and_submit_extent
>         mpage_map_one_extent
>          ext4_map_blocks
>           ext4_mb_new_blocks
>            ext4_mb_normalize_request
>             >>> start + size <= ac->ac_o_ex.fe_logical
>            ext4_mb_regular_allocator
>             ext4_mb_simple_scan_group
>              ext4_mb_use_best_found
>               ext4_mb_new_preallocation
>                ext4_mb_new_inode_pa
>                 ext4_mb_use_inode_pa
>                  >>> set ac->ac_b_ex.fe_len <= 0
>            ext4_mb_mark_diskspace_used
>             >>> BUG_ON(ac->ac_b_ex.fe_len <= 0);
> 
> we can easily reproduce this problem with the following commands:
> 	`fallocate -l100M disk`
> 	`mkfs.ext4 -b 1024 -g 256 disk`
> 	`mount disk /mnt`
> 	`fsstress -d /mnt -l 0 -n 1000 -p 1`
> 
> The size must be smaller than or equal to EXT4_BLOCKS_PER_GROUP.
> Therefore, "start + size <= ac->ac_o_ex.fe_logical" may occur
> when the size is truncated. So start should be the start position of
> the group where ac_o_ex.fe_logical is located after alignment.
> In addition, when the value of fe_logical or EXT4_BLOCKS_PER_GROUP
> is very large, the value calculated by start_off is more accurate.
> 
> Fixes: cd648b8a8fd5 ("ext4: trim allocation requests to group size")
> Reported-by: Hulk Robot <hulkci@xxxxxxxxxx>
> Signed-off-by: Baokun Li <libaokun1@xxxxxxxxxx>
> ---
>  fs/ext4/mballoc.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index ea653d19f9ec..32410b79b664 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -4107,6 +4107,17 @@ ext4_mb_normalize_request(struct ext4_allocation_context *ac,
>  	size = size >> bsbits;
>  	start = start_off >> bsbits;
>  
> +	/*
> +	 * Because size must be less than or equal to
> +	 * EXT4_BLOCKS_PER_GROUP, start should be the start position of
> +	 * the group where ac_o_ex.fe_logical is located after alignment.
> +	 * In addition, when the value of fe_logical or
> +	 * EXT4_BLOCKS_PER_GROUP is very large, the value calculated
> +	 * by start_off is more accurate.
> +	 */
> +	start = max(start, round_down(ac->ac_o_ex.fe_logical,
> +			EXT4_BLOCKS_PER_GROUP(ac->ac_sb)));

This does not look right. The second argument in round_down() must be a
power of two, but there is no such restriction on blocks per group.

Also I am not quite sure why do we adjust the start in this way at all?
If we found what seems to be a preallocated extent which we can use and
we're actually going to use 0 lenght extent it seems like the problem is
somewhere else? Can you desribe the problem a bit more in detail?

Maybe I need to look at the ext4_mb_normalize_request() some more.

-Lukas

> +
>  	/* don't cover already allocated blocks in selected range */
>  	if (ar->pleft && start <= ar->lleft) {
>  		size -= ar->lleft + 1 - start;
> -- 
> 2.31.1
>