On Sat, 25 Apr 2015 16:35:24 -0500 David Wahler <dwahler@xxxxxxxxx> wrote: > Hi, > > I'm trying to reshape a 4-disk RAID6 array by adding a fifth "missing" > drive. Maybe that's a weird thing to do, so for context: I'm > converting from a 3-disk RAID10, by creating a new RAID6 with the > three new disks and then moving disks one at a time between the > arrays. I did it this way so that I could test for problems with the > reshape procedure before irrevocably modifying more than one of the > original disks. > > (I do also have an offsite backup of the most important data, but it's > inconvenient to access and I'm hoping not to need it.) > > Anyway, the reshape was going fine until about 70% completion, and > then it got stuck. I've tried rebooting a few times: the array can be > assembled in read-only mode, but as soon as it goes read-write and the > reshape process continues, it gets through a few megabytes and hangs. > At that point, any other process that tries to access the array also > hangs uninterruptibly. > > Here's what shows up in dmesg: > > [ 721.183225] INFO: task md127_resync:1730 blocked for more than 120 seconds. > [ 721.183978] Not tainted 4.0.0 #1 > [ 721.184751] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 721.185514] md127_resync D ffff88042ea94440 0 1730 2 0x00000000 > [ 721.185516] ffff88041a24ed20 0000000000000400 ffff88041ca82a20 > 0000000000000246 > [ 721.185518] ffff8800b8b5ffd8 ffff8800b8b5fbf0 ffff880419035a30 > 0000000000000004 > [ 721.185519] ffff8800b8b5fd1c ffff88040e91d000 ffffffff8155c73f > ffff880419035800 > [ 721.185520] Call Trace: > [ 721.185526] [<ffffffff8155c73f>] ? schedule+0x2f/0x80 > [ 721.185530] [<ffffffffa0888390>] ? reshape_request+0x1e0/0x8f0 [raid456] > [ 721.185533] [<ffffffff810a86f0>] ? wait_woken+0x90/0x90 > [ 721.185535] [<ffffffffa0888dae>] ? sync_request+0x30e/0x390 [raid456] > [ 721.185547] [<ffffffffa02cbf89>] ? is_mddev_idle+0xc9/0x130 [md_mod] > [ 721.185550] [<ffffffffa02cf432>] ? md_do_sync+0x802/0xd30 [md_mod] > [ 721.185555] [<ffffffff8101c356>] ? native_sched_clock+0x26/0x90 > [ 721.185558] [<ffffffffa02cbb30>] ? md_safemode_timeout+0x50/0x50 [md_mod] > [ 721.185561] [<ffffffffa02cbc56>] ? md_thread+0x126/0x130 [md_mod] > [ 721.185563] [<ffffffff8155c0c0>] ? __schedule+0x2a0/0x8f0 > [ 721.185565] [<ffffffffa02cbb30>] ? md_safemode_timeout+0x50/0x50 [md_mod] > [ 721.185568] [<ffffffff81089403>] ? kthread+0xd3/0xf0 > [ 721.185570] [<ffffffff81089330>] ? kthread_create_on_node+0x180/0x180 > [ 721.185572] [<ffffffff81560598>] ? ret_from_fork+0x58/0x90 > [ 721.185574] [<ffffffff81089330>] ? kthread_create_on_node+0x180/0x180 > > And the output of mdadm --detail/-E: > https://gist.github.com/anonymous/0b090668b56ef54bb2f0 What is wrong with simply including this directly in the email??? Anyway: Bad Block Log : 512 entries available at offset 72 sectors - bad blocks present. that is the only thing that looks at all interesting. Particularly the last 3 words. What does mdadm --examine-badblocks /dev/sd[cde]1 show? NeilBrown > > I was originally running a Debian 3.16.0 kernel, and then upgraded to > 4.0 to see if it would help, but no such luck. > > Does anyone have any suggestions? Since the data on the array seems to > be fine, hopefully there's a solution that doesn't involve re-creating > it from scratch and restoring from backups. > > Thanks, > -- David > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html
Attachment:
pgpG1c14gKoPl.pgp
Description: OpenPGP digital signature