Re: 2 Disks Jumped Out While Reshaping RAID5

"Majed B." <majedb@xxxxxxxxx> · Mon, 7 Sep 2009 03:44:11 +0300

Thanks a lot Neil for your help :)

kernel logs showed a SATA link error for sdg. I double checked the
cables and they were more than fine and the array was running for
weeks before I did the reshaping and no errors were reported before
the reshaping process.

I'm using an MSI motherboard (MS-7514) and been having random issues
with it since reaching 6 disks. I've recently ordered an EVGA
motherboard and if things turn to be stable on it, I'll ditch MSI for
good.

Throughout searching for the past 6 days, I noticed people complaining
from acpi and apic causing issues, so I turned them off and will see
how things turn out.

These are the hard disks I'm using:

root@Adam:~# hddtemp /dev/sd[a-h]
/dev/sda: WDC WD10EACS-00D6B1: 26°C
/dev/sdb: WDC WD10EACS-00D6B1: 28°C
/dev/sdc: WDC WD10EACS-00ZJB0: 29°C
/dev/sdd: WDC WD10EADS-65L5B1: 27°C
/dev/sde: WDC WD10EADS-65L5B1: 28°C
/dev/sdf: MAXTOR STM31000340AS: 28°C
/dev/sdg: WDC WD10EACS-00ZJB0: 26°C
/dev/sdh: WDC WD10EADS-00L5B1: 25°C
/dev/sdi: Hitachi HDS721680PLAT80: 32°C

(sdi is the OS disk)

Neil, do you suggest any certain test/stress-tests to put sdg through?

I'll force a couple of short and long smartd tests on it, and have dd
read the whole disk a couple of times to make sure all sectors are
read properly. Is that sufficient?

Thank you again.

On Mon, Sep 7, 2009 at 3:31 AM, NeilBrown<neilb@xxxxxxx> wrote:
> On Mon, September 7, 2009 10:01 am, Majed B. wrote:
>> I have installed mdadm 3.0 and ran -Af and now it's continuing
>> reshaping!!!
>
> Excellent.
>
> Based on the --examine info you provided it appears that
> /dev/sdg1 reported an error at about 00:10:39 on Wednesday morning
> and was evicted from the array.  Reshape was up to 2435GB (37%) at
> that point.
> Reshape continued until 06:40:04 that morning at which point it
> had reached 3201GB (49%).  At that point /dev/sdf1 seems to have
> reported an error so the whole array went off line.
>
> When you reassembled with mdadm-3.0 and --force, it excluded sdg1
> as that was the oldest, and marked sdf1 as up-to-date, and continued.
>
> The reshape processes will have redone the last few chunks so all
> the data will have been properly relocated.
>
> As all the superblocks report that the array was "State : clean",
> you can be quite sure that all your data is safe (if they were
> "State : active" there would be a small chance some a block or two
> was corrupted and a fsck etc would be advised).
>
> It wouldn't hurt to examine your kernel logs to see what sort of
> error was tiggered at those two times in case there might be a need
> to replace a device.
>
>
>
>
>> sdg1 is not in the list. Is that correct?!  sdg1 was one of the
>> array's disks before expanding. So I guess now the array is degraded
>> yet is reshaping as if it had 8 disks, correct?
>
> Yes, that is correct.
> It may be that sdg has a transient error, or it may have a serious
> media or other error.  You should convince yourself that it is working
> reliably before adding it back in to the array.
>
>
>
>>
>> So after the reshaping process is over, I can add sdg1 again and it
>> will resync properly, right?
>
> Yes it will, providing no write-errors occur while writing data to it.
>
> NeilBrown
>
>

-- 
       Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html