Re: [PATCH RFC] ext4: Validate inode pa before using preallocation blocks

Zhihao Cheng <chengzhihao1@xxxxxxxxxx> · Mon, 11 Mar 2024 15:31:32 +0800

在 2024/3/11 14:38, Zhihao Cheng 写道:
In ext4 continue & no-journal mode, physical blocks could be allocated
more than once (caused by writing extent entries failed & reclaiming
extent cache) in preallocation process, which could trigger a BUG_ON
(pa->pa_free < len) in ext4_mb_use_inode_pa().

  kernel BUG at fs/ext4/mballoc.c:4681!
  invalid opcode: 0000 [#1] PREEMPT SMP
  CPU: 3 PID: 97 Comm: kworker/u8:3 Not tainted 6.8.0-rc7
  RIP: 0010:ext4_mb_use_inode_pa+0x1b6/0x1e0
  Call Trace:
   ext4_mb_use_preallocated.constprop.0+0x19e/0x540
   ext4_mb_new_blocks+0x220/0x1f30
   ext4_ext_map_blocks+0xf3c/0x2900
   ext4_map_blocks+0x264/0xa40
   ext4_do_writepages+0xb15/0x1400
   do_writepages+0x8c/0x260
   writeback_sb_inodes+0x224/0x720
   wb_writeback+0xd8/0x580
   wb_workfn+0x148/0x820

Details are shown as following:

0. Given a file with i_size=4096 with one mapped block
1. Write block no 1, blocks 1~3 are preallocated.
    ext4_ext_map_blocks
     ext4_mb_normalize_request
      size = 16 * 1024
      size = end - start // Allocate 3 blocks (bs = 4096)
     ext4_mb_regular_allocator
      ext4_mb_regular_allocator
      ext4_mb_regular_allocator
      ext4_mb_use_inode_pa
       pa->pa_free -= len // 3 - 1 = 2
2. Extent buffer head is written failed, es cache and buffer head are
    reclaimed.
3. Write blocks 1~3
    ext4_ext_map_blocks
     newex.ee_len = 3
     ext4_ext_check_overlap // Find nothing, there should have been block 1
     allocated = map->m_len  // 3
     ext4_mb_new_blocks
      ext4_mb_use_preallocated
       ext4_mb_use_inode_pa
        BUG_ON(pa->pa_free < len) // 2 < 3!

Fix it by adding validation checking for inode pa. If invalid pa is
detected, stop using inode preallocation, drop invalid pa to avoid it
being used again, mark group block bitmap as corrupted to avoid allocating
from the erroneous group.

After marking group block bitmap corrupted, mpage_map_and_submit_extent 
returns -EFSCORRUPTED from ext4_map_blocks -> ext4_ext_map_blocks -> 
ext4_mb_new_blocks -> ext4_mb_regular_allocator -> ext4_mb_find_by_goal 
-> ext4_mb_load_buddy -> ext4_mb_init_cache -> ext4_wait_block_bitmap -> 
 ext4_validate_block_bitmap-> EXT4_MB_GRP_BBITMAP_CORRUPT(grp).
I think the checking 'EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info)' is not 
needed in ext4_mb_load_buddy, because all callers have checked it before 
using e4b. In this case(ext4_mb_regular_allocator), goal group could be 
skipped if it is corrupted, so ext4_mb_find_by_goal should load 
buddy(ext4_mb_load_buddy) without checking corrupted and then check 
corrupted with returning 0. But we can't delete the 
checking(EXT4_MB_GRP_BBITMAP_CORRUPT(grp)) directly from 
ext4_validate_block_bitmap, because some ext4_wait_block_bitmap callers 
may still need it. IOW, there are some logic pathes need the checking, 
but some don't need.

Above problem is independent with the problem solved by this patch, so I 
send out the patch.

Fetch a reproducer in Link.

Cc: stable@xxxxxxxxxxxxxxx
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218576
Signed-off-by: Zhihao Cheng <chengzhihao1@xxxxxxxxxx>
Signed-off-by: Zhang Yi <yi.zhang@xxxxxxxxxx>
---
  fs/ext4/mballoc.c | 128 +++++++++++++++++++++++++++++++++++-----------
  1 file changed, 98 insertions(+), 30 deletions(-)