BUG_ON(sh->batch_head) in init_stripe()

Stephane Thiell <sthiell@xxxxxxxxxxxx> · Mon, 8 May 2017 17:15:49 +0000

Hi list,

I have a recurring issue using raid6 which results in a panic due to a BUG_ON(sh->batch_head). During last weekend, the issue occurred during a weekly raid-check. The raid volumes (12 total) are pretty new, no mismatch nor hardware errors have been detected.

[535089.369357] kernel BUG at drivers/md/raid5.c:527!
[535089.374700] invalid opcode: 0000 [#1] SMP 
[535089.379384] Modules linked in: ...
[535089.503509] CPU: 34 PID: 280061 Comm: md0_resync Tainted: G           OE  ------------   3.10.0-514.10.2.el7_lustre.x86_64 #1

This is the backtrace:

crash> bt 280061
PID: 280061  TASK: ffff8800757cde20  CPU: 34  COMMAND: "md0_resync"
 #0 [ffff88024e217830] machine_kexec at ffffffff81059bdb
 #1 [ffff88024e217890] __crash_kexec at ffffffff81105382
 #2 [ffff88024e217960] crash_kexec at ffffffff81105470
 #3 [ffff88024e217978] oops_end at ffffffff8168f508
 #4 [ffff88024e2179a0] die at ffffffff8102e93b
 #5 [ffff88024e2179d0] do_trap at ffffffff8168ebc0
 #6 [ffff88024e217a20] do_invalid_op at ffffffff8102b144
 #7 [ffff88024e217ad0] invalid_op at ffffffff816984de
    [exception RIP: raid5_get_active_stripe+1809]
    RIP: ffffffffa0e4ed71  RSP: ffff88024e217b88  RFLAGS: 00010086
    RAX: 0000000000000000  RBX: ffff883fe5e40408  RCX: dead000000000200
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: ffff8823ebf45ca0
    RBP: ffff88024e217c30   R8: ffff8823ebf45cb0   R9: 0000000000000080
    R10: 0000000000000006  R11: 0000000000000000  R12: ffff883fe5e40400
    R13: ffff8823ebf45ca0  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffff88024e217b80] raid5_get_active_stripe at ffffffffa0e4e996 [raid456]
 #9 [ffff88024e217be0] raid5_release_stripe at ffffffffa0e48f24 [raid456]
#10 [ffff88024e217c38] raid5_sync_request at ffffffffa0e53c4b [raid456]
#11 [ffff88024e217ca8] md_do_sync at ffffffff814fef9b
#12 [ffff88024e217e50] md_thread at ffffffff814fb1b5
#13 [ffff88024e217ec8] kthread at ffffffff810b06ff
#14 [ffff88024e217f50] ret_from_fork at ffffffff81696b98

It appears to be triggered by BUG_ON(sh->batch_head) in init_stripe():

crash> l drivers/md/raid5.c:524
519     static void init_stripe(struct stripe_head *sh, sector_t sector, int previous)
520     {
521             struct r5conf *conf = sh->raid_conf;
522             int i, seq;
523     
524             BUG_ON(atomic_read(&sh->count) != 0);
525             BUG_ON(test_bit(STRIPE_HANDLE, &sh->state));
526             BUG_ON(stripe_operations_active(sh));
527             BUG_ON(sh->batch_head);    <<<
528 

Other I/Os were processed at this time, but I am not sure how to check that they were actually on the same md:

crash> ps | grep ">"
<snip>
> 59684      2  28  ffff88407fdc0fb0  RU   0.0       0      0  [md0_raid6]
> 61479      2  17  ffff883e46e80000  UN   0.0       0      0  [ll_ost_io01_001]
> 220748      2  23  ffff881fb7ab8fb0  UN   0.0       0      0  [ll_ost_io01_011]
> 220750      2  19  ffff881fb7abce70  UN   0.0       0      0  [ll_ost_io01_013]
> 279158      2  14  ffff883ab46b4e70  RU   0.0       0      0  [md22_resync]
> 280061      2  34  ffff8800757cde20  RU   0.0       0      0  [md0_resync]
> 280829      2   6  ffff881c72296dd0  RU   0.0       0      0  [md6_resync]

example of possible concurrent writing thread:

crash> bt 61479
PID: 61479  TASK: ffff883e46e80000  CPU: 17  COMMAND: "ll_ost_io01_001"
 #0 [ffff883ffc805e58] crash_nmi_callback at ffffffff8104d2e2
 #1 [ffff883ffc805e68] nmi_handle at ffffffff8168f699
 #2 [ffff883ffc805eb0] do_nmi at ffffffff8168f813
 #3 [ffff883ffc805ef0] end_repeat_nmi at ffffffff8168ead3
    [exception RIP: _raw_spin_lock_irq+63]
    RIP: ffffffff8168e09f  RSP: ffff883e46e3f588  RFLAGS: 00000002
    RAX: 00000000000044c2  RBX: ffff883fe5e40408  RCX: 000000000000c464
    RDX: 000000000000c468  RSI: 000000000000c468  RDI: ffff883fe5e40408
    RBP: ffff883e46e3f588   R8: 0000000000000000   R9: 0000000000000080
    R10: 0000000000000002  R11: 0000000000000000  R12: ffff883fe5e40400
    R13: 0000000000000000  R14: ffff883fe0e61900  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #4 [ffff883e46e3f588] _raw_spin_lock_irq at ffffffff8168e09f
 #5 [ffff883e46e3f590] raid5_get_active_stripe at ffffffffa0e4e6cb [raid456]
 #6 [ffff883e46e3f648] raid5_make_request at ffffffffa0e4ef55 [raid456]
 #7 [ffff883e46e3f738] md_make_request at ffffffff814f7dfc
 #8 [ffff883e46e3f798] generic_make_request at ffffffff812ee939
 #9 [ffff883e46e3f7e0] submit_bio at ffffffff812eea81
#10 [ffff883e46e3f838] osd_submit_bio at ffffffffa10a0bcc [osd_ldiskfs]
#11 [ffff883e46e3f848] osd_do_bio at ffffffffa10a3007 [osd_ldiskfs]
#12 [ffff883e46e3f968] osd_write_commit at ffffffffa10a3b94 [osd_ldiskfs]
#13 [ffff883e46e3fa08] ofd_commitrw_write at ffffffffa1113774 [ofd]
#14 [ffff883e46e3fa80] ofd_commitrw at ffffffffa1116f2d [ofd]
#15 [ffff883e46e3fb08] obd_commitrw at ffffffffa0c43c22 [ptlrpc]
#16 [ffff883e46e3fb70] tgt_brw_write at ffffffffa0c1bfc1 [ptlrpc]
#17 [ffff883e46e3fcd8] tgt_request_handle at ffffffffa0c18275 [ptlrpc]
#18 [ffff883e46e3fd20] ptlrpc_server_handle_request at ffffffffa0bc41fb [ptlrpc]
#19 [ffff883e46e3fde8] ptlrpc_main at ffffffffa0bc82b0 [ptlrpc]
#20 [ffff883e46e3fec8] kthread at ffffffff810b06ff
#21 [ffff883e46e3ff50] ret_from_fork at ffffffff81696b98

The same issue happened during a heavy IOR benchmark a few months ago and was described in https://jira.hpdd.intel.com/browse/LU-8917 (on a slightly older el7 kernel).

I also found out that other users have described similar issues, like on this thread: https://lkml.org/lkml/2016/12/23/205

Now, I have a crash dump and I’m trying to understand why sh->batch_head could be set in init_stripe(), which is called by raid5_get_active_stripe() when __find_stripe() failed BUT get_free_stripe() succeeded. If sh->batch_head is set in that case, that means the idle stripe found had it set…

Does someone have any idea of how to troubleshoot or solve this?

Thanks!

Stephan��.n��������+%������w��{.n�����{����w��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f