Re: Accidental grow before add

Jon Hardcastle <jd_hardcastle@xxxxxxxxx> · Mon, 27 Sep 2010 01:11:19 -0700 (PDT)

--- On Sun, 26/9/10, Mike Hartman <mike@xxxxxxxxxxxxxxxxxxxx> wrote:

> From: Mike Hartman <mike@xxxxxxxxxxxxxxxxxxxx>
> Subject: Accidental grow before add
> To: linux-raid@xxxxxxxxxxxxxxx
> Date: Sunday, 26 September, 2010, 8:27
> I think I may have mucked up my
> array, but I'm hoping somebody can
> give me a tip to retrieve the situation.
> 
> I had just added a new disk to my system and partitioned it
> in
> preparation for adding it to my RAID 6 array, growing it
> from 7
> devices to 8. However, I jumped the gun (guess I'm more
> tired than I
> thought) and ran the grow command before I added the new
> disk to the
> array as a spare.
> 
> In other words, I should have run:
> 
> mdadm --add /dev/md0 /dev/md3p1
> mdadm --grow /dev/md0 --raid-devices=8
> --backup-file=/grow_md0.bak
> 
> but instead I just ran
> 
> mdadm --grow /dev/md0 --raid-devices=8
> --backup-file=/grow_md0.bak
> 
> I immediately checked /proc/mdstat and got the following
> output:
> 
> Personalities : [linear] [raid0] [raid1] [raid10] [raid6]
> [raid5] [raid4]
> md0 : active raid6 sdk1[0] md2p1[7] sde1[6] sdf1[5]
> md1p1[4] sdl1[3] sdj1[1]
>       7324227840 blocks super 1.2 level 6,
> 256k chunk, algorithm 2
> [8/7] [UUUUUUU_]
>       [>....................] 
> reshape =  0.0% (79600/1464845568)
> finish=3066.3min speed=7960K/sec
> 
> md3 : active raid0 sdb1[0] sdh1[1]
>       1465141760 blocks super 1.2 128k
> chunks
> 
> md2 : active raid0 sdc1[0] sdd1[1]
>       1465141760 blocks super 1.2 128k
> chunks
> 
> md1 : active raid0 sdi1[0] sdm1[1]
>       1465141760 blocks super 1.2 128k
> chunks
> 
> unused devices: <none>
> 
> At this point I figured I was probably ok. It looked like
> it was
> restructuring the array to expect 8 disks, and with only 7
> it would
> just end up being in a degraded state. So I figured I'd
> just cost
> myself some time - one reshape to get to the degraded 8
> disk state,
> and another reshape to activate the new disk instead of
> just the one
> reshape onto the new disk. I went ahead and added the new
> disk as a
> spare, figuring the current reshape operation would ignore
> it until it
> completed, and then the system would notice it was degraded
> with a
> spare available and rebuild it.
> 
> However, things have slowed to a crawl (relative to the
> time it
> normally takes to regrow this array) so I'm afraid
> something has gone
> wrong. As you can see in the initial mdstat above, it
> started at
> 7960K/sec - quite fast for a reshape on this array. But
> just a couple
> minutes after that it had dropped down to only 667K. It
> worked its way
> back up through 1801K to 10277K, which is about average for
> a reshape
> on this array. Not sure how long it stayed at that level,
> but now
> (still only 10 or 15 minutes after the original mistake)
> it's plunged
> all the way down to 40K/s. It's been down at this level for
> several
> minutes and still dropping slowly. This doesn't strike me
> as a good
> sign for the health of the unusual regrow operation.
> 
> Anybody have a theory on what could be causing the
> slowness? Does it
> seem like a reasonable consequence to growing an array
> without a spare
> attached? I'm hoping that this particular growing mistake
> isn't
> automatically fatal or mdadm would have warned me or asked
> for a
> confirmation or something. Worst case scenario I'm hoping
> the array
> survives even if I just have to live with this speed and
> wait for it
> to finish - although at the current rate that would take
> over a
> year... Dare I mount the array's partition to check on the
> contents,
> or would that risk messing it up worse?
> 
> Here's the latest /proc/mdstat:
> 
> Personalities : [linear] [raid0] [raid1] [raid10] [raid6]
> [raid5] [raid4]
> md0 : active raid6 md3p1[8](S) sdk1[0] md2p1[7] sde1[6]
> sdf1[5]
> md1p1[4] sdl1[3] sdj1[1]
>       7324227840 blocks super 1.2 level 6,
> 256k chunk, algorithm 2
> [8/7] [UUUUUUU_]
>       [>....................] 
> reshape =  0.1% (1862640/1464845568)
> finish=628568.8min speed=38K/sec
> 
> md3 : active raid0 sdb1[0] sdh1[1]
>       1465141760 blocks super 1.2 128k
> chunks
> 
> md2 : active raid0 sdc1[0] sdd1[1]
>       1465141760 blocks super 1.2 128k
> chunks
> 
> md1 : active raid0 sdi1[0] sdm1[1]
>       1465141760 blocks super 1.2 128k
> chunks
> 
> unused devices: <none>
> 
> Mike
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

I am more interested to know why it kicked off a reshape that would leave the array in a degraded state without a warning and needing a '--force' are you sure there wasn't capacity to 'grow' anyway?

Also, when i first ran my reshape it was incredibly slow from Raid5~6 tho.. it literally took days.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html