system hang at reboot while one disk failed in raid5

hanguozhong <hanguozhong@xxxxxxxxxxxx> · Thu, 18 Oct 2012 20:01:50 +0800



Hi, every one:
    Yesterday, I created a 4*2T raid5 just for test, the kernel 2.6.38 was used. 
    When the array was in recovery, there was IO errors occurred in sdb.
    I used "cat /proc/mdstat" to see the status of the array, and "smartctl -A /dev/sdb" of sde.
    The outputs were the following:
    
    #cat /proc/mdstat 
    Personalities : [raid0] [raid6] [raid5] [raid4] 
    md127 : active raid5 sdd[3] sdc[2] sdb[1](F) sda[0]
      5860540032 blocks super 1.2 level 5, 128k chunk, algorithm 2 [4/3] [U_UU]

    unused devices: <none>

    #smartctl -A /dev/sdb
    smartctl 5.39.1 2010-01-28 r3054 [tilegx-unknown-linux-gnu] (local build)
    Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

    Probable ATA device behind a SAT layer

    "sdb" was set to "Faulty", and there were no outputs about S.M.A.R.T. of "sdb". 
    Then I used "reboot" to restart my system. Unfortunately, 
    The system seemed hanged at down and didn't start up. The outputs were the following:

    2012-10-17 20:09:16   umount: can't remount /dev/md127 read-only
    2012-10-17 20:09:16   umount: can't remount /dev/sdq2 read-only
    2012-10-17 20:09:16   umount: devpts busy - remounted read-only
    2012-10-17 20:09:17   umount: sysfs busy - remounted read-only
    2012-10-17 20:09:17   umount: proc busy - remounted read-only
    2012-10-17 20:09:17   umount: can't remount tmpfs read-only
    2012-10-17 20:09:17   umount: can't remount rootfs read-only
    2012-10-17 20:09:17   The system is going down NOW!
    2012-10-17 20:09:17   Sent SIGTERM to all processes
    2012-10-17 20:09:18   md: md127 in immediate safe mode
    2012-10-17 20:09:18   Sent SIGKILL to all processes
    2012-10-17 20:09:18   Requesting system reboot
    2012-10-17 20:09:19   gbe2: link up, 100 Mbps
    2012-10-17 20:09:19   md: stopping all md devices.
    2012-10-17 20:09:20   sd 1:0:7:0: [sdd] Synchronizing SCSI cache
    2012-10-17 20:09:20   BUG: failure at /home/mde4xx/childhood/iDCN_trunk/2.Software/Tilera_src/src/linux-2.6.38.8/drivers/md/md.c:6620/md_write_start()!
    2012-10-17 20:09:20   Kernel panic - not syncing: BUG!
    2012-10-17 20:09:20   
    2012-10-17 20:09:20   Starting stack dump of tid 7923, pid 7923 (kworker/1:1) on cpu 1 at cycle 43939696870501
    2012-10-17 20:09:20     frame 0: 0xfffffff7001935a0 dump_stack+0x0/0x20 (sp 0xfffffe407863fc18)
    2012-10-17 20:09:20     frame 1: 0xfffffff700520960 panic+0x150/0x3a0 (sp 0xfffffe407863fc18)
    2012-10-17 20:09:20     frame 2: 0xfffffff700577f08 md_write_start+0x2d8/0x320 (sp 0xfffffe407863fcc0)
    2012-10-17 20:09:20   sd 1:0:6:0: [sdo] Synchronizing SCSI cache
    2012-10-17 20:09:20     frame 3: 0xfffffff710334268 make_request.cold+0x110/0xc20 [raid456] (sp 0xfffffe407863fd10)
    2012-10-17 20:09:20     frame 4: 0xfffffff7007aa078 md_submit_flush_data+0x88/0xe0 (sp 0xfffffe407863fe28)
    2012-10-17 20:09:20     frame 5: 0xfffffff7002999e8 process_one_work+0x1e8/0x538 (sp 0xfffffe407863fe48)
    2012-10-17 20:09:20   sd 1:0:5:0: [sdn] Synchronizing SCSI cache
    2012-10-17 20:09:20     frame 6: 0xfffffff700274f78 worker_thread+0x378/0x898 (sp 0xfffffe407863fea0)
    2012-10-17 20:09:20     frame 7: 0xfffffff7000f0530 kthread+0xe0/0xe8 (sp 0xfffffe407863ff80)
    2012-10-17 20:09:20   sd 1:0:4:0: [sdm] Synchronizing SCSI cache
    2012-10-17 20:09:20     frame 8: 0xfffffff7000bab38 start_kernel_thread+0x18/0x20 (sp 0xfffffe407863ffe8)
    2012-10-17 20:09:20   Stack dump complete
    2012-10-17 20:09:20   Client requested halt.

    can anyone help me??韬{.n?????%??檩??w?{.n???{炳盯w???塄}?财??j:+v??????2??璀??摺?囤??z夸z罐?+?????w棹f