Re: Busted disks caused healthy ones to fail

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



14 Internal drives on a single power supply plus the mb/cpu/etc? Oy; I've got 15 + a p2-400 spinning between 2 550w power supplies, and I'm worried it is getting overloaded. I might be paranoid, but I had some flakiness that was pretty much impossible to debug, so I took broad steps and overestimated. Figured that maybe a heavily loaded supply could hiccup under an unusual condition if too many were attached to one.. and, while anecdotal, my once-a-month drive hiccup (require re-add to array, nothing else) problem did go away when I added a power supply.

comsatcat wrote:

The two disks that were actually dead were both on a different bus.  The
OS disk that died was on scsi0.

Is there a way around this behavior (ie: kernel params that can be
adjusted such as timeout values and queuing)?  It never really recovered
correctly after the disks died, a manual reboot as required.
Applications which were using the failed devices would hang forever (I'm
assuming they were waiting for queued commands to complete).

IDE: not in use
Power: 14 internal drives, no external
Temp: fust fine
Kids: Upstairs taking tech calls.


Thanks, Ben


On Tue, 2004-12-14 at 01:55 -0500, Guy wrote:


Did the disks that failed have anything in common?

SCSI:
If you have disks on 1 SCSI bus, a single failed disk can affect other
disks.  By removing the bad disk you correct the problems with the others.

IDE:  (or what ever they call it today)
2 disks on 1 bus, 1 drive failure will cause the other to fail most of the
time.

Power supply:
If you have external disks, they will have another power supply.  If you
have problems with this power supply, they all could be affected.  Even a
common power cable can cause multi drive failures.

Temperature:
Disks getting too hot can cause failures.

Kids:
Someone turned the disk cabinet off?

I am sure this list is not complete.  But it may help.

Guy

-----Original Message-----
From: linux-raid-owner@xxxxxxxxxxxxxxx
[mailto:linux-raid-owner@xxxxxxxxxxxxxxx] On Behalf Of comsatcat
Sent: Tuesday, December 14, 2004 1:42 AM
To: linux-raid@xxxxxxxxxxxxxxx
Subject: Busted disks caused healthy ones to fail

An odd thing happened this weekend.  We were doing some heavy I/O when
one of our servers had two drives in two seperate raid1 mirrors pop.
This was not odd as these drives are old and the batch they are from
have been failing on other boxen as well.  What is odd is that our brand
new disks which the OS resides on (2 drives in raid 1) half busted.

There are 4 md devices

md/0 md/1
md/2
md/3


md3, md2, and md1 all lost the 2nd drive in the array (sdh3, sdh6, and
sdh5).  md0 however was fine with sdh1 being fine.  Why would losing
disks cause a seemingly healthy disk to go astray?

P.S. I have pull out tons of syslogs showing the two bad disks failing
if that would help.


Thanks, Ben

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html



- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html






--------------------------------------------
My mailbox is spam-free with ChoiceMail, the leader in personal and corporate anti-spam solutions. Download your free copy of ChoiceMail from www.choicemailfree.com

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux