RE: Raid-5 Reshape Gone Bad

<bmanning@xxxxxxxxxxxxx> · Mon, 2 Mar 2009 10:28:41 -0500

Neil,

Thanks for the tip.. It appears it might work.  I will try it tonight.

I had actually been working to recreate the situation in a vmware test
bed, which I did successfully recreate.  And it suffered the same
symptoms I had on the real hardware. When I tried your stripe_cache
setting, it immediately began the process in my vm.

(and you were right, the newest mdadm any of the resue cds I tried was
2.6.7)

Will let you know how it goes on the real thing tonight.

-----Original Message-----
From: linux-raid-owner@xxxxxxxxxxxxxxx
[mailto:linux-raid-owner@xxxxxxxxxxxxxxx] On Behalf Of NeilBrown
Sent: Sunday, March 01, 2009 10:33 PM
To: Brian Manning
Cc: linux-raid@xxxxxxxxxxxxxxx
Subject: Re: Raid-5 Reshape Gone Bad

On Mon, March 2, 2009 1:42 pm, Brian Manning wrote:
> I've been running a MD three-drive raid-5 for a while now with no 
> problems on a CentOS 5.2 i386 box.  I've attempted to add a fourth 
> drive to the array yesterday & grow it.  This is where things got
ugly....
>
> It began the reshape as expected, some hours later I rebooted the box 
> for another reason entirely, forgetting about the reshape that was 
> still going on.  But it was a clean shutdown process and md stopped 
> just fine.  So I wasn't too worried about it, I knew it was just pick 
> up again once it booted.
>
> After startup the kernel found the md, said it was to resume the 
> reshape... then it came time for the kernel to mount root.. and hung 
> scanning for Logical Volumes, I left it for over an hour, it never 
> proceeded past this stage.  Disk io light was off, nothing going on.
>
> My entire OS save /boot is on the raid-5, split across several LVM2s 
> inside that md device.  It's always worked fine for me in the past.
>
> But now LVM is hanging on boot, I can't even get into single mode or 
> anything like that.  So I bring out the boot disc and go into rescue
mode.
>
> I check the raid status, everything looks okay, so I manually start 
> the MD again from the boot cd, and that fires up as expected, 
> however.... when I look at /proc/mdstat... the speed is 0KB/sec, and 
> the ETA is growing by 100's of minutes a second.
>
> I let this go for about 2 hours, and nothing ever happens, speed is 0,

> diskio light is off, nothing is happening.

I notice that your array has a chunksize of 1024K.
That is big enough to cause an issue that was only resolved in
mdadm-2.6.8, which I suspect you aren't using.

If you
  echo 1024 > /sys/block/md0/md/stripe_cache_size
it might spring to life.

I think the 1024 is right, but if it doesn't work try a larger number
(e.g. 8192) just in case I got the math wrong.

And:  no, you cannot go back to a 3 drive array.  The transformation is
currently one-way.

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info
at  http://vger.kernel.org/majordomo-info.html

______________________________________________________________
______________________________________________________________
This email may contain information protected under the Family 
Educational Rights and Privacy Act (FERPA) or the Health Insurance 
Portability and Accountability Act (HIPAA).  If this email contains 
confidential and/or privileged health or student information and you 
are not entitled to access such information under FERPA or HIPAA, 
federal regulations require that you destroy this email without 
reviewing it and you may not forward it to anyone.

--
This message has been scanned for viruses and
dangerous content by MailScanner, ClamAV and Bitdefender  and is
believed to be clean.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html