Re: raid1d crash at boot

Michał Mirosław <mirq-linux@xxxxxxxxxxxx> · Sat, 7 Jan 2012 13:53:04 +0100

On Sat, Nov 19, 2011 at 02:41:39PM +0100, Michał Mirosław wrote:
> I get following BUG_ON tripped while booting, before rootfs is mounted by
> Debian's initrd. This started to happen for kernels since sometime
> during 3.1-rcX.
> 
> [    6.246170] ------------[ cut here ]------------
> [    6.246246] kernel BUG at /mnt/src-tmp/jaja/git/qmqm/drivers/scsi/scsi_lib.c:1153!
> [    6.246347] invalid opcode: 0000 [#1] PREEMPT SMP
> [    6.246558] CPU 5
> [    6.246614] Modules linked in: usb_storage uas firewire_ohci firewire_core crc_itu_t xhci_hcd [last unloaded: scsi_wait_scan]
> [    6.247131]
> [    6.247194] Pid: 288, comm: md1_raid1 Not tainted 3.2.0-rc2mq+ #5 System manufacturer System Product Name/P8Z68-V PRO
> [    6.247422] RIP: 0010:[<ffffffff812443a1>]  [<ffffffff812443a1>] scsi_setup_fs_cmnd+0x45/0x83
> [    6.247563] RSP: 0018:ffff8804140d1bd0  EFLAGS: 00010046
> [    6.247634] RAX: 0000000000000000 RBX: ffff88041d463800 RCX: 00000000ffffffff
> [    6.247710] RDX: 00000000ffffffff RSI: ffff8804142fd600 RDI: ffff88041d463800
> [    6.247785] RBP: ffff8804142fd600 R08: 00000000ffffffff R09: 0000000000017a00
> [    6.247861] R10: ffff88041d464000 R11: ffff88041d464000 R12: 0000000000000800
> [    6.247936] R13: 0000000000000001 R14: ffff88041d463800 R15: 0000000000000000
> [    6.248013] FS:  0000000000000000(0000) GS:ffff88042fb40000(0000) knlGS:0000000000000000
> [    6.248104] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [    6.248176] CR2: 000000000042b200 CR3: 0000000001605000 CR4: 00000000000406e0
> [    6.248252] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    6.248328] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [    6.248404] Process md1_raid1 (pid: 288, threadinfo ffff8804140d0000, task ffff88041539a4c0)
> [    6.248495] Stack:
> [    6.248557]  0000000000000000 ffff8804142fd600 ffff8804142fd600 ffffffff8124a9be
> [    6.248819]  ffff8804142fe3a0 ffff8804142fd600 ffff88041d463848 ffffffff811a5d67
> [    6.249084]  ffff8804142fe3a0 ffff880415452400 ffff8804156f0000 00000000fffffa2b
> [    6.249346] Call Trace:
> [    6.249414]  [<ffffffff8124a9be>] ? sd_prep_fn+0x2cd/0xb72
> [    6.249490]  [<ffffffff811a5d67>] ? cfq_dispatch_requests+0x6f2/0x82c
> [    6.249567]  [<ffffffff8119a168>] ? blk_peek_request+0xc8/0x1bf
> [    6.249638]  [<ffffffff81243d83>] ? scsi_request_fn+0x64/0x406
> [    6.249708]  [<ffffffff8119a526>] ? blk_flush_plug_list+0x186/0x1b7
> [    6.249780]  [<ffffffff8119a562>] ? blk_finish_plug+0xb/0x2a
> [    6.249849]  [<ffffffff812a400f>] ? raid1d+0x91/0xb22
> [    6.249919]  [<ffffffff81031729>] ? get_parent_ip+0x9/0x1b
> [    6.249990]  [<ffffffff813a5c9e>] ? sub_preempt_count+0x83/0x94
> [    6.250060]  [<ffffffff813a202a>] ? schedule+0x73f/0x772
> [    6.250129]  [<ffffffff813a5d49>] ? add_preempt_count+0x9a/0x9c
> [    6.250199]  [<ffffffff813a330b>] ? _raw_spin_lock_irqsave+0x13/0x31
> [    6.250271]  [<ffffffff812a9bb4>] ? md_thread+0xfe/0x11c
> [    6.250340]  [<ffffffff8104f6c6>] ? add_wait_queue+0x3c/0x3c
> [    6.250410]  [<ffffffff812a9ab6>] ? signal_pending+0x17/0x17
> [    6.250479]  [<ffffffff8104f045>] ? kthread+0x76/0x7e
> [    6.250548]  [<ffffffff813a8c34>] ? kernel_thread_helper+0x4/0x10
> [    6.250618]  [<ffffffff8104efcf>] ? kthread_worker_fn+0x139/0x139
> [    6.250688]  [<ffffffff813a8c30>] ? gs_change+0xb/0xb
> [    6.250754] Code: 85 c0 74 1d 48 8b 00 48 85 c0 74 15 48 8b 40 50 48 85 c0 74 0c 48 89 ee 48 89 df ff d0 85 c0 75 44 66 83 bd d0 00 00 00 00 75 02 <0f> 0b 48 89 ee 48 89 df e8 b6 e9 ff ff 48 85 c0 48 89 c2 74 20
> [    6.253544] RIP  [<ffffffff812443a1>] scsi_setup_fs_cmnd+0x45/0x83
> [    6.253658]  RSP <ffff8804140d1bd0>
> [    6.253722] ---[ end trace 533b0b5008dd7cee ]---
> [    6.253788] note: md1_raid1[288] exited with preempt_count 1

I've bisected this to following commit. It's not trivially revertable on v3.2,
but I'll do some tries with it.

Best Regards,
Michał Mirosław

---

commit d2eb35acfdccbe2a3622ed6cc441a5482148423b
Author: NeilBrown <neilb@xxxxxxx>
Date:   Thu Jul 28 11:31:48 2011 +1000

    md/raid1: avoid reading from known bad blocks.

    Now that we have a bad block list, we should not read from those
    blocks.
    There are several main parts to this:
      1/ read_balance needs to check for bad blocks, and return not only
         the chosen device, but also how many good blocks are available
         there.
      2/ fix_read_error needs to avoid trying to read from bad blocks.
      3/ read submission must be ready to issue multiple reads to
         different devices as different bad blocks on different devices
         could mean that a single large read cannot be served by any one
         device, but can still be served by the array.
         This requires keeping count of the number of outstanding requests
         per bio.  This count is stored in 'bi_phys_segments'
      4/ retrying a read needs to also be ready to submit a smaller read
         and queue another request for the rest.

    This does not yet handle bad blocks when reading to perform resync,
    recovery, or check.

    'md_trim_bio' will also be used for RAID10, so put it in md.c and
    export it.

    Signed-off-by: NeilBrown <neilb@xxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html