Re: Mdadm server eating drives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Forgot to ask previously.  This system is attached to a UPS isn't it?

-- 
Stan


On 7/2/2013 2:44 PM, Stan Hoeppner wrote:
> On 7/2/2013 10:48 AM, Barrett Lewis wrote:
>> After sending the last email I went out and bought 2 new WD reds, and
>> a new motherboard.  I came back and in those 2 hours all but 1 of my
>> drives failed to the point of being unable to read the superblock so
>> it really seems like my array is ended
> 
> The drive may be ok.  They all may be.
> 
>> On Mon, Jul 1, 2013 at 8:57 PM, Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx> wrote:
>>>> I noticed one drive was going up and down and determined that
>>>> the drive had actual physical damage to the power connecter and
>>>> was losing and regaining power through vibration.
>>>
>>> This intermittent contact could have damaged the PSU.  You've continued
>>> to have drive and lockup problems since replacing this drive with bad
>>> connector.
>>
>> I hadn't thought of it until you said so but I bet you are right about
>> the iffy connector.  It certainly seemed as if I never had an issue
>> with the array for 8 months, and then suddenly everything got unstable
>> at once, and since then I've lost atleast 6 hard drives.
> 
> Your drives may not be toast.  Don't toss them out, and don't throw up
> your hands yet.
> 
>>> The pink elephant in the room is thermal failure due to insufficient
>>> airflow.  The symptoms you describe sound like drives overheating.  What
>>> chassis is this?  Make/model please.  If you've installed individual
>>> drive hot swap cages, etc, it would be helpful if you snapped a photo or
>>> two and made those available.
>>
>> It is also possible that there were cooling issues.  The case is an
>> NZXT H2.  It has some fans blowing directly on all the hard drives,
>> but there were a few times I have to admit I took the fans off to work
>> on things and forgot to put them back on for a few days, coming back
>> to find them very hot to the touch.  I would have mentioned that
>> earlier, but a data recovery place told me that it was unlikely that
>> would be a culprit (after they had my money).
> 
> I checked out the chassis on the NZXT site.  With the front fans
> removed, you have only 2x120mm low rpm, low static pressure, and low CFM
> exhaust fans, one on in the PSU, one top rear.  With 8 drives packed in
> such close proximity and with other lower resistance intake paths (the
> perforated chassis bottom), you won't get enough air through the front
> drive cage to cool those drives properly over a long period.
> 
> However, running with the two front fans removed for a couple of days on
> an occasion or two shouldn't have overheated the drives to the point of
> permanent damage, assuming ambient air temp was ~75F or lower, and
> assuming you were not performing long array operations such as rebuilds
> or reshapes--if you did so the drives could get hot enough, long enough,
> to be permanently damaged.
> 
>> Maybe thats all academic at this point.  I guess i'll have to rebuild
>> my server from scratch since all my disks seem destroyed and I can't
>> trust the mobo, cpu, or psu.
> 
> Don't start over.  Not just yet.  Leave everything as is for now.
> Simply replace the PSU.  Fire it up and see what you can recover.
> 
>> The psu wasn't dirt cheap, Thermaltake TR2 500w @ $58.  
> 
> The price isn't relevant.  The quality and rail configuration is, and
> whether it's been damaged.  I checked the spec on your TR2-500
> yesterday.  It has dual +12V rails, one rated at 18A and one at 17A.  I
> was unable to locate a wiring diagram for it.  On paper it should have
> plenty of juice for your gear when in working order.  My assumption here
> is that something internal to it may have failed.
> 
>> Should I buy all new
>> everything?  
> 
> I wouldn't.  Most of your gear is probably fine.  Get the PSU swapped
> out and see if that fixes it.  You may still have to wipe the drives and
> build a new array.  You should know pretty quickly if the PSU swap fixed
> the problem, as drives will not continue to drop, or they will.  You
> already have a new mobo in hand, so if the PSU isn't the problem, swap
> the mobo.  That's a good chassis design with good airflow assuming you
> keep the front fans in it.  Why you'd leave them removed is beyond me.
> 
>> If so, while I'm at can you suggest a set of consumer
>> level hardware ideal running a personal mdadm server.  Powered but not
>> overpowered, reliable not bleeding edge.  If I need 6-8 sata ports,
>> should I do onboard or get a controller?
> 
> A new HBA shouldn't be necessary.  But if you choose to go that route
> further down the road I'd recommend an LSI 9211-8i.
> 
>> I still have one backup allthough I'm very nervous now since it's on a
>> 3 disk RAID0, just asking to implode (created in an emergency).
> 
> I assume this resides on a different machine.
> 
> Swap the PSU.  Recover the array if possible.  If not blow it away and
> create new.  If no drives drop out you're probably golden and the PSU
> fixed the problem.  If they drop, swap in the new mobo.  At that point
> you'll have replaced everything that could be the source of the problem
> but for the remaining original drives.  They can't all be bad, if any.
> Always run with those front fans installed.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux