kernel BUG at fs/ext4/mballoc.c:1911!

Liu Bo <bo.liu@xxxxxxxxxxxxxxxxx> · Fri, 30 Mar 2018 06:15:40 +0800

Hi,

I got the following crash from ext4, although I got a partially dumped
vmcore, I'm not able to find out the root cause, any ideas are
appreciated.

The kernel is 4.9.0.

[943102.751128] EXT4-fs error (device nvme0n1p1): ext4_wait_block_bitmap:503: comm fsabc: Cannot read block bitmap - block_group = 8383, block_bitmap = 274202639
[943102.751131] EXT4-fs (nvme0n1p1): previous I/O error to superblock detected
[943102.751134] Buffer I/O error on dev nvme0n1p1, logical block 0, lost sync page write
[943102.751298] ------------[ cut here ]------------
[943102.751299] kernel BUG at fs/ext4/mballoc.c:1911!
[943102.751300] invalid opcode: 0000 [#1] SMP
...
[943102.751316] task: ffff887be5fabe00 task.stack: ffffc90037994000
[943102.751339] RIP: 0010:[<ffffffffa05336dc>]  [<ffffffffa05336dc>] ext4_mb_simple_scan_group+0x14c/0x160 [ext4]
[943102.751340] RSP: 0018:ffffc90037997820  EFLAGS: 00010246
[943102.751340] RAX: 0000000000000008 RBX: 000000000000000c RCX: 0000000000000008
[943102.751341] RDX: 0000000000000040 RSI: 0000000000000038 RDI: ffff88634047aff8
[943102.751342] RBP: ffffc90037997860 R08: ffff000000000000 R09: 000015f80607a517
[943102.751342] R10: ffff887e3f89c000 R11: ffff886ff7a54928 R12: ffff887d002cde80
[943102.751343] R13: ffff887e3cabc000 R14: ffffc90037997898 R15: 0000000000000030
[943102.751344] FS:  00007f1da6eff700(0000) GS:ffff887e7e680000(0000) knlGS:0000000000000000
[943102.751345] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[943102.751345] CR2: 00007f1d78008708 CR3: 0000007be3012000 CR4: 00000000007606f0
[943102.751346] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[943102.751347] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[943102.751347] PKRU: 55555554
[943102.751347] Stack:
[943102.751350]  000000080000417f 0000000800000001 6a0d0bf01fb48b22 ffff887d002cde80
[943102.751351]  00000000000020bf 0000000000000000 0000000000000000 ffff887e3cabc000
[943102.751352]  ffffc90037997900 ffffffffa05340a6 00000000379978a0 ffff887e3cabd000
[943102.751353] Call Trace:
[943102.751363]  [<ffffffffa05340a6>] ext4_mb_regular_allocator+0x356/0x460 [ext4]
[943102.751371]  [<ffffffffa0535b9c>] ext4_mb_new_blocks+0x5ec/0xaf0 [ext4]
[943102.751379]  [<ffffffffa052434d>] ? __read_extent_tree_block+0x5d/0x1f0 [ext4]
[943102.751386]  [<ffffffffa0525633>] ? ext4_find_extent+0x143/0x2d0 [ext4]
[943102.751394]  [<ffffffffa052a7de>] ext4_ext_map_blocks+0xb5e/0xf30 [ext4]
[943102.751397]  [<ffffffff811bb75c>] ? node_dirty_ok+0x12c/0x170
[943102.751403]  [<ffffffffa04f7802>] ext4_map_blocks+0x172/0x600 [ext4]
[943102.751406]  [<ffffffff8127a8c1>] ? alloc_buffer_head+0x21/0x60
[943102.751407]  [<ffffffff81233601>] ? mem_cgroup_commit_charge+0x91/0x530
[943102.751413]  [<ffffffffa04f7d22>] _ext4_get_block+0x92/0x100 [ext4]
[943102.751419]  [<ffffffffa04f7da6>] ext4_get_block+0x16/0x20 [ext4]
[943102.751420]  [<ffffffff8127d357>] __block_write_begin_int+0x197/0x5e0
[943102.751425]  [<ffffffffa04f7d90>] ? _ext4_get_block+0x100/0x100 [ext4]
[943102.751432]  [<ffffffffa04fcb56>] ? ext4_write_begin+0x126/0x5b0 [ext4]
[943102.751433]  [<ffffffff8127d7b1>] __block_write_begin+0x11/0x20
[943102.751439]  [<ffffffffa04fcbdc>] ext4_write_begin+0x1ac/0x5b0 [ext4]
[943102.751446]  [<ffffffffa052cfdd>] ? __ext4_journal_stop+0x3d/0xa0 [ext4]
[943102.751449]  [<ffffffff811ab578>] generic_perform_write+0xc8/0x1c0
[943102.751451]  [<ffffffff8125f37e>] ? file_update_time+0x5e/0x110
[943102.751452]  [<ffffffff811adbb5>] __generic_file_write_iter+0x185/0x1d0
[943102.751458]  [<ffffffffa04f1a6b>] ext4_file_write_iter+0x8b/0x380 [ext4]
[943102.751460]  [<ffffffff81247429>] ? vfs_getattr_nosec+0x29/0x40
[943102.751462]  [<ffffffff81247c6f>] ? cp_new_stat+0x14f/0x180
[943102.751463]  [<ffffffff81241115>] __vfs_write+0xe5/0x160
[943102.751464]  [<ffffffff812423b5>] vfs_write+0xb5/0x1a0
[943102.751465]  [<ffffffff81243875>] SyS_write+0x55/0xc0
[943102.751468]  [<ffffffff8171a6da>] entry_SYSCALL_64_fastpath+0x1a/0xc5
[943102.751476] Code: 39 44 24 3c 75 27 49 8b 85 60 <0f> 0b 0f 0b e8 3b 57 b5 e0 90 66 
[943102.751484] RIP  [<ffffffffa05336dc>] ext4_mb_simple_scan_group+0x14c/0x160 [ext4]
[943102.751484]  RSP <ffffc90037997820>

Here is my diagnosis so far,

The code of interest is
---------------------------
static noinline_for_stack
void ext4_mb_simple_scan_group(struct ext4_allocation_context *ac,
                                        struct ext4_buddy *e4b)
{
        struct super_block *sb = ac->ac_sb;
        struct ext4_group_info *grp = e4b->bd_info;
        void *buddy;
        int i;
        int k;
        int max;

        BUG_ON(ac->ac_2order <= 0);
        for (i = ac->ac_2order; i <= sb->s_blocksize_bits + 1; i++) {
                if (grp->bb_counters[i] == 0)
                        continue;

                buddy = mb_find_buddy(e4b, i, &max);
                BUG_ON(buddy == NULL);

                k = mb_find_next_zero_bit(buddy, max, 0);
                BUG_ON(k >= max);

                ac->ac_found++;

                ac->ac_b_ex.fe_len = 1 << i;
                ac->ac_b_ex.fe_start = k << i;
                ac->ac_b_ex.fe_group = e4b->bd_group;
-------------------------------------
(a) ac->ac_2order = 11 as 
crash> grep ac_2order ac.txt
  ac_2order = 11 '\v', 

(b) ac_b_ex.fe_start = 0, so order = 11 has been skipped in continue;
crash> grep ac_b_ex -A 5 ac.txt
  ac_b_ex = {
    fe_logical = 2048, 
    fe_start = 0, 
    fe_group = 0, 
    fe_len = 0
  }, 

(c) grp->bb_counters[11] = 0, so we head to i = 12

(d) 
Inside the buddy bitmap in memory, 'order=12' has 1 free block, i.e. bb_counters[12] = 1.
-----------
crash> grep bb_counter grp.txt
  bb_counters = 0xffff887e101a9648
crash> rd 0xffff887e101a9648 -32 16
ffff887e101a9648:  00000013 fffffe21 ffffff33 ffffffbb   ....!...3.......
ffff887e101a9658:  fffffffa fffffff8 00000007 00000002   ................
ffff887e101a9668:  0000000a 00000005 00000007 00000000   ................
ffff887e101a9678:  00000001 00000001 00000000 00000000   ................
-----------

But...
k = mb_find_next_zero_bit(buddy, max, 0);
BUG_ON(k >= max)   --> kernel BUG at fs/ext4/mballoc.c:1911!

k = max = 8,
the buddy bitmap shows every bit is 1, i.e. no free block is available.

(e) there're errors about reading this bitmap(group 8383) shown in the log,
crash> grep group e4b.txt
  bd_group = 8383

 however when it comes to BUG_ON(k >= max), reading this bitmap has
 been successful, and it is the inconsistence between ->bb_counters
 and the buddy bitmap that ends up with the crash, but if the buddy
 bitmap was regenerated, bb_counters should match with the buddy
 bitmap.

Anyway, can anyone take a look at it, please?

thanks,
-liubo