On Sun, May 15, 2011 at 5:37 PM, NeilBrown <neilb@xxxxxxx> wrote: > On Sun, 15 May 2011 13:33:28 -0400 Glen Dragon <glen.dragon@xxxxxxxxx> wrote: > >> In trying to reshape a raid5 array, I encountered some problems. >> I was trying to reshape from raid5 3->4 devices. The reshape process >> started with seeming no problems, however i noticed in the kernel log >> a number of ata3.00: failed command: WRITE FPDMA QUEUED errors. >> In trying to determine if this was going to be bad for me, I disabled >> ncq on this device. Looking at the log, i notice around the same time >> /dev/sdd reported problems and took itself offline. >> At this point the reshape seemed to be continuing w/o issue, even >> though one of the drives was offline.. I wasn't sure that this made >> sense. >> >> Shortly after, I noticed that the progress on the reshape had stalled. >> I tried changing the stripe_cache_size from 256 to [1024|2048|4096], >> but the reshape did not resume. top reported that the reshape process >> was using 100% of one core, and the load average was climbing into the >> 50's >> >> At this point I rebooted. The array does not start. >> >> Can the reshape be restarted? I cannot figure out where the backup >> file ended up. It does not seem to be where I thought I saved it. > > When a reshape is increasing the size of the array the backup file is only > needed for the first few stripes. After that it is irrelevant and is removed. > > You should be able to simply reassemble the array and it should continue the > reshape. > > What happens when you try: > > mdadm -S /dev/md_d2 > mdadm -A /dev/md_d2 /dev/sd[abc]5 -vv > > Please report both the messsages from mdadm and any new message is "dmesg" at > the time. > > NeilBrown > # mdadm -S /dev/md_d2 mdadm: stopped /dev/md_d2 # mdadm -A /dev/md_d2 /dev/sd[abcd]5 -vv mdadm: looking for devices for /dev/md_d2 mdadm: /dev/sda5 is identified as a member of /dev/md_d2, slot 0. mdadm: /dev/sdb5 is identified as a member of /dev/md_d2, slot 1. mdadm: /dev/sdc5 is identified as a member of /dev/md_d2, slot 3. mdadm: /dev/sdd5 is identified as a member of /dev/md_d2, slot 2. mdadm:/dev/md_d2 has an active reshape - checking if critical section needs to be restored mdadm: No backup metadata on device-3 mdadm: added /dev/sdb5 to /dev/md_d2 as 1 mdadm: added /dev/sdd5 to /dev/md_d2 as 2 mdadm: added /dev/sdc5 to /dev/md_d2 as 3 mdadm: added /dev/sda5 to /dev/md_d2 as 0 mdadm: /dev/md_d2 assembled from 3 drives - not enough to start the array while not clean - consider --force. # mdadm -D /dev/md_d2 mdadm: md device /dev/md_d2 does not appear to be active. # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] [multipath] [raid1] md_d2 : inactive sda5[0](S) sdc5[3](S) sdd5[2](S) sdb5[1](S) 2799357952 blocks super 0.91 md8 : active raid5 sdh1[0] sdg1[4] sdf1[1] sdi1[3] sde1[2] 5860542464 blocks level 5, 512k chunk, algorithm 2 [5/5] [UUUUU] md1 : active raid5 sdd3[2] sdb3[1] sda3[0] 62926336 blocks level 5, 256k chunk, algorithm 2 [3/3] [UUU] md0 : active raid1 sdb1[1] sda1[0] sdd1[2] 208704 blocks [3/3] [UUU] kernel log: md: md_d2 stopped. md: unbind<sda5> md: export_rdev(sda5) md: unbind<sdc5> md: export_rdev(sdc5) md: unbind<sdd5> md: export_rdev(sdd5) md: unbind<sdb5> md: export_rdev(sdb5) md: md_d2 stopped. md: bind<sdb5> md: bind<sdd5> md: bind<sdc5> md: bind<sda5> -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html