On Sat, Sep 11, 2010 at 4:43 PM, Neil Brown <neilb@xxxxxxx> wrote: > On Sat, 11 Sep 2010 14:20:40 -0400 > Mike Hartman <mike@xxxxxxxxxxxxxxxxxxxx> wrote: > >> PART 3: >> >> Update: >> >> I'm even more concerned about this now, because I just started the >> newest reshaping to add a new drive with: >> >> mdadm --grow -c 256 --raid-devices=5 --backup-file=/grow_md0.bak /dev/md0 >> >> And the system output: >> >> mdadm: Need to backup 768K of critical section.. >> >> cat /proc/mdstat shows the reshaping is proceeding, >> >> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] >> md0 : active raid6 sdi1[0] sdf1[5] md1p1[4] sdj1[3] sdh1[1] >> 2929691136 blocks super 1.2 level 6, 128k chunk, algorithm 2 [5/5] [UUUUU] >> [>....................] reshape = 0.0% (56576/1464845568) >> finish=2156.9min speed=11315K/sec >> >> md1 : active raid0 sdg1[0] sdk1[1] >> 1465141760 blocks super 1.2 128k chunks >> >> unused devices: <none> >> >> but I've checked for /grow_md0.bak and it's not there. So it looks >> like for some reason it ignored my backup file option. > > It didn't. > > When you making an array larger, you only need the backup file for a small > 'critical region' at the beginning of the reshape - 768K worth in your case. > > Once that is complete the backup-file is not needed and so is removed. > > So your current situation is no worse that before. Ok. When I did the reshape from RAID 5 to RAID 6 (moving from 3 disks to 4) it kept the backup file around until at least 13% (since that's when it locked and I had to restart it with the backup) but I imagine that's a less common case than just growing an array. Your comments give me renewed confidence. > > [When making an array smaller, the critical section happen and the very end, > so mdadm keeps the backup file around - unused - until then. Then uses it > quickly and completes. When reshaping an array without changing the size the > 'critical section' lasts for the entire time so a backup file is needed and > is very heavily used] > > I don't know yet what is causing the lock-up. A quick look at your logs > suggest that it could be related to the barrier handling. Maybe trying to > handle a barrier during a reshape is prone to races of some sort - I wouldn't > be very surprised by that. Just note that during the second lockup no reshape or resync was going on. The array state was stable, I was just writing to it. > > I'll have a look at the code and see what I can find. Thanks a lot. If it was only a risk when I was growing/reshaping the array, and covered by the backup file, it would just be an inconvenience. But since it can seemingly happen at any time it's a problem. > > Thanks for the report, > NeilBrown > > >> >> This scares me, because if I experience the lockup again and am forced >> to reboot, without a backup file I'm afraid my array will be hosed. >> I'm also afraid to stop it cleanly right now for the same reason. >> >> So in addition to fixing the lockup itself, does anyone know if >> there's a way to either cancel this reshaping or belatedly add the >> backup file in a different way so it will be recoverable? It's only at >> 1% and says it will take another 2193 minutes. >> >> Mike >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html