Dear Neil, > At the top of Grow_restart (in Grow.c), just put > return 0; > > That will definitely get you your array back. Thank you for your help, I've got my array assembled, running, and all my data back up! But everything was not so smooth as I was (secretly) hoping originally. Details follow. After I first run ./mdadm --assemble /dev/md1 array actually assembled but with 6 drives out of 8. I don't know exact reason why two partitions are missing, but attempts to add missing partitions with > mdadm --add /dev/md1 /dev/sda2 resulted in message that "/dev/sdc1 is locked" or something like that. Partitions are there (fdisk -l /dev/sda supports that) and they are present in /dev. I suspect that maybe udev was doing something wrong but I don't know for sure. Missing partitions were ones from drives which are used to run my root partition /dev/md0 - raid1 made of /dev/sda1 and /dev/sdc1. Anyway, since it's RAID6 even with two drives missing array was able to get running. But recovery speed was zero i.e. /proc/mdstat was something like: (it is not verbatim copy, but a fake based on current state - just to give an idea of what it was like before) > md1 : active raid6 sde2[1] sdd2[7] sdb2[6] sda2[5] sdg2[3] sdf2[2] > 2191859712 blocks super 0.91 level 6, 1024k chunk, algorithm 2 [8/6] [_UUU_UUU] > [=>...................] reshape = 7.0% (51783680/730619904) finish=1571992.7min speed=0K/sec So speed was zero and finish time was gradually increasing from tens of thousands to tens of millions minutes and more. Any process trying to read from /dev/md1 would hang in "D" state, including mount so I was not able to see my data at that moment. Few reboots later (during those I was debugging my boot scripts to find out that /etc/init.d/udev in startup scripts never got control back after starting /sbin/udevsettle, but I believe this is a separate matter not connected with md. Anyway, during each reboot I could see the same condition - array assembled but with 0k reshape speed (and before somebody asked - /sys/block/md1/md/sync_speed_{min,max} had their default values - 1000 and 200000 resp). After some reboots I could see array was assembled from all 8 disks but still reshape speed was zero. Few reboots later after fixing my startup scripts, I was rather pleasantly surprised to hear my hard drives busily humming and to find in /proc/mdstat that reshape speed is 800K/sec and growing (up to current value of about 10000K/sec). Array was working from 6 partitions out of 8. /dev/md1 mounted fine and I have all my precious data back intact, and needless to say that I'm very happy to see that. Now, I have my array in degraded condition (6 out of 8 drives running) and reshaping. I don't feel adventurous enough to try to add drives to array before it will finish current reshaping. :-) I believe it's time to get some backup space for about 2TB of data kept in this array. So array's current state according to /proc/mdstat is: > md1 : active raid6 sde2[1] sdd2[7] sdb2[6] sda2[5] sdg2[3] sdf2[2] > 2191859712 blocks super 0.91 level 6, 1024k chunk, algorithm 2 [8/6] [_UUU_UUU] > [=>...................] reshape = 8.5% (62185472/730619904) finish=969.0min speed=11495K/sec And I'm waiting for it to finish this operation. In the mean time /dev/md1 works fine for both read and write so our file server is happily back online to much happiness of my colleagues. Please let me know if I can provide any information useful for debugging or fixing this issue. I seems to me that something needs to be fixed on kernel side too (but I'm not really qualified to make such judgments). I will post again after finishing current reshape operation and adding two "lost" partitions to array. I believe it will take more than 52 hours to finish all operations (current reshape will take 16 more hours and two times 18 hours to add two 750 GB partitions). Will let you know afterwards. My thanks are going again to Niel for quick and efficient fix - he is just living up to his reputation of living legend of programming world. > I think the correct fix will be to put: > > if (info->reshape_progress > SOME_NUMBER) > return 0; > > at the top of Grow_restart. I just have to review exactly how it > works to make sure I pick the correct "SOME_NUMBER". > > Also > if (__le64_to_cpu(bsb.length) < > info->reshape_progress) > continue; /* No new data here */ > > might need to become > if (__le64_to_cpu(bsb.length) < > info->reshape_progress) > return 0; /* No new data here */ > > but I need to think carefully about that too. I'm looking forward to see some new fixes and improvements for this wonderful piece of software - linux md. Best regards, Anton "Ashutosh" Voloshin Saint Petersburg, Russia (SCSMath) - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html