Re: Mdadm server eating drives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/2/2013 10:48 AM, Barrett Lewis wrote:
> After sending the last email I went out and bought 2 new WD reds, and
> a new motherboard.  I came back and in those 2 hours all but 1 of my
> drives failed to the point of being unable to read the superblock so
> it really seems like my array is ended

The drive may be ok.  They all may be.

> On Mon, Jul 1, 2013 at 8:57 PM, Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx> wrote:
>>> I noticed one drive was going up and down and determined that
>>> the drive had actual physical damage to the power connecter and
>>> was losing and regaining power through vibration.
>>
>> This intermittent contact could have damaged the PSU.  You've continued
>> to have drive and lockup problems since replacing this drive with bad
>> connector.
> 
> I hadn't thought of it until you said so but I bet you are right about
> the iffy connector.  It certainly seemed as if I never had an issue
> with the array for 8 months, and then suddenly everything got unstable
> at once, and since then I've lost atleast 6 hard drives.

Your drives may not be toast.  Don't toss them out, and don't throw up
your hands yet.

>> The pink elephant in the room is thermal failure due to insufficient
>> airflow.  The symptoms you describe sound like drives overheating.  What
>> chassis is this?  Make/model please.  If you've installed individual
>> drive hot swap cages, etc, it would be helpful if you snapped a photo or
>> two and made those available.
>
> It is also possible that there were cooling issues.  The case is an
> NZXT H2.  It has some fans blowing directly on all the hard drives,
> but there were a few times I have to admit I took the fans off to work
> on things and forgot to put them back on for a few days, coming back
> to find them very hot to the touch.  I would have mentioned that
> earlier, but a data recovery place told me that it was unlikely that
> would be a culprit (after they had my money).

I checked out the chassis on the NZXT site.  With the front fans
removed, you have only 2x120mm low rpm, low static pressure, and low CFM
exhaust fans, one on in the PSU, one top rear.  With 8 drives packed in
such close proximity and with other lower resistance intake paths (the
perforated chassis bottom), you won't get enough air through the front
drive cage to cool those drives properly over a long period.

However, running with the two front fans removed for a couple of days on
an occasion or two shouldn't have overheated the drives to the point of
permanent damage, assuming ambient air temp was ~75F or lower, and
assuming you were not performing long array operations such as rebuilds
or reshapes--if you did so the drives could get hot enough, long enough,
to be permanently damaged.

> Maybe thats all academic at this point.  I guess i'll have to rebuild
> my server from scratch since all my disks seem destroyed and I can't
> trust the mobo, cpu, or psu.

Don't start over.  Not just yet.  Leave everything as is for now.
Simply replace the PSU.  Fire it up and see what you can recover.

> The psu wasn't dirt cheap, Thermaltake TR2 500w @ $58.  

The price isn't relevant.  The quality and rail configuration is, and
whether it's been damaged.  I checked the spec on your TR2-500
yesterday.  It has dual +12V rails, one rated at 18A and one at 17A.  I
was unable to locate a wiring diagram for it.  On paper it should have
plenty of juice for your gear when in working order.  My assumption here
is that something internal to it may have failed.

> Should I buy all new
> everything?  

I wouldn't.  Most of your gear is probably fine.  Get the PSU swapped
out and see if that fixes it.  You may still have to wipe the drives and
build a new array.  You should know pretty quickly if the PSU swap fixed
the problem, as drives will not continue to drop, or they will.  You
already have a new mobo in hand, so if the PSU isn't the problem, swap
the mobo.  That's a good chassis design with good airflow assuming you
keep the front fans in it.  Why you'd leave them removed is beyond me.

> If so, while I'm at can you suggest a set of consumer
> level hardware ideal running a personal mdadm server.  Powered but not
> overpowered, reliable not bleeding edge.  If I need 6-8 sata ports,
> should I do onboard or get a controller?

A new HBA shouldn't be necessary.  But if you choose to go that route
further down the road I'd recommend an LSI 9211-8i.

> I still have one backup allthough I'm very nervous now since it's on a
> 3 disk RAID0, just asking to implode (created in an emergency).

I assume this resides on a different machine.

Swap the PSU.  Recover the array if possible.  If not blow it away and
create new.  If no drives drop out you're probably golden and the PSU
fixed the problem.  If they drop, swap in the new mobo.  At that point
you'll have replaced everything that could be the source of the problem
but for the remaining original drives.  They can't all be bad, if any.
Always run with those front fans installed.

-- 
Stan




--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux