On 7/2/2013 10:48 AM, Barrett Lewis wrote: > After sending the last email I went out and bought 2 new WD reds, and > a new motherboard. I came back and in those 2 hours all but 1 of my > drives failed to the point of being unable to read the superblock so > it really seems like my array is ended The drive may be ok. They all may be. > On Mon, Jul 1, 2013 at 8:57 PM, Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx> wrote: >>> I noticed one drive was going up and down and determined that >>> the drive had actual physical damage to the power connecter and >>> was losing and regaining power through vibration. >> >> This intermittent contact could have damaged the PSU. You've continued >> to have drive and lockup problems since replacing this drive with bad >> connector. > > I hadn't thought of it until you said so but I bet you are right about > the iffy connector. It certainly seemed as if I never had an issue > with the array for 8 months, and then suddenly everything got unstable > at once, and since then I've lost atleast 6 hard drives. Your drives may not be toast. Don't toss them out, and don't throw up your hands yet. >> The pink elephant in the room is thermal failure due to insufficient >> airflow. The symptoms you describe sound like drives overheating. What >> chassis is this? Make/model please. If you've installed individual >> drive hot swap cages, etc, it would be helpful if you snapped a photo or >> two and made those available. > > It is also possible that there were cooling issues. The case is an > NZXT H2. It has some fans blowing directly on all the hard drives, > but there were a few times I have to admit I took the fans off to work > on things and forgot to put them back on for a few days, coming back > to find them very hot to the touch. I would have mentioned that > earlier, but a data recovery place told me that it was unlikely that > would be a culprit (after they had my money). I checked out the chassis on the NZXT site. With the front fans removed, you have only 2x120mm low rpm, low static pressure, and low CFM exhaust fans, one on in the PSU, one top rear. With 8 drives packed in such close proximity and with other lower resistance intake paths (the perforated chassis bottom), you won't get enough air through the front drive cage to cool those drives properly over a long period. However, running with the two front fans removed for a couple of days on an occasion or two shouldn't have overheated the drives to the point of permanent damage, assuming ambient air temp was ~75F or lower, and assuming you were not performing long array operations such as rebuilds or reshapes--if you did so the drives could get hot enough, long enough, to be permanently damaged. > Maybe thats all academic at this point. I guess i'll have to rebuild > my server from scratch since all my disks seem destroyed and I can't > trust the mobo, cpu, or psu. Don't start over. Not just yet. Leave everything as is for now. Simply replace the PSU. Fire it up and see what you can recover. > The psu wasn't dirt cheap, Thermaltake TR2 500w @ $58. The price isn't relevant. The quality and rail configuration is, and whether it's been damaged. I checked the spec on your TR2-500 yesterday. It has dual +12V rails, one rated at 18A and one at 17A. I was unable to locate a wiring diagram for it. On paper it should have plenty of juice for your gear when in working order. My assumption here is that something internal to it may have failed. > Should I buy all new > everything? I wouldn't. Most of your gear is probably fine. Get the PSU swapped out and see if that fixes it. You may still have to wipe the drives and build a new array. You should know pretty quickly if the PSU swap fixed the problem, as drives will not continue to drop, or they will. You already have a new mobo in hand, so if the PSU isn't the problem, swap the mobo. That's a good chassis design with good airflow assuming you keep the front fans in it. Why you'd leave them removed is beyond me. > If so, while I'm at can you suggest a set of consumer > level hardware ideal running a personal mdadm server. Powered but not > overpowered, reliable not bleeding edge. If I need 6-8 sata ports, > should I do onboard or get a controller? A new HBA shouldn't be necessary. But if you choose to go that route further down the road I'd recommend an LSI 9211-8i. > I still have one backup allthough I'm very nervous now since it's on a > 3 disk RAID0, just asking to implode (created in an emergency). I assume this resides on a different machine. Swap the PSU. Recover the array if possible. If not blow it away and create new. If no drives drop out you're probably golden and the PSU fixed the problem. If they drop, swap in the new mobo. At that point you'll have replaced everything that could be the source of the problem but for the remaining original drives. They can't all be bad, if any. Always run with those front fans installed. -- Stan -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html