Re: Computer suddenly failed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jevos, Peter wrote:

> I'd like to ask you about strange problem. I hope I chose a correct
> mailing list
> I have 2 IDE disks in RAID 1 with Reiserfs.
> Once I noticed message in the log:
> 
> hde: dma_timer_expiry: dma status == 0x20
> hde: timeout waiting for DMA
> PDC202XX: Primary channel reset.
> hde: timeout waiting for DMA
> hde: (__ide_dma_test_irq) called while not waiting
> hde: status timeout: status=0xd0 { Busy }
>  PDC202XX: Primary channel reset.
> hde: drive not ready for command
> ide2: reset: success
> hde: status error: status=0x58 { DriveReady SeekComplete DataRequest }
> hde: drive not ready for command
> hde: status error: status=0x58 { DriveReady SeekComplete DataRequest }

Your drive has died.

> DMA on hde was turned off so I turned it on again. Than I tried to made
> files backup on the hde, but when I ran tar computer didn't response
> even for sysrq, no log was written. I had to made a hard restart. It
> repeats for 4 times when I tried to did something with files on the hde.
> Now I'm afraid to do anything on that machine,unfortunately it is
> production server.

Once you have replaced the drive:

1. Ensure that the drives are being cooled. Modern (i.e. large) hard
drives tend to run quite hot. In the absence of sufficient airflow,
they can easily exceed their maximum operating temperature (typically
55C). This is more of an issue with larger drives, and with multiple
drives in adjacent drive bays. In my experience, Maxtor drives tend to
run hotter than similar drives from other vendors.

2. Run a temperature-monitoring utility such as hddtemp, and ensure
that it will notify support staff if the temperature gets too high. If
the location isn't staffed 24/7, ensure that it will shut down the
system in the event that the temperature exceeds the drives' operating
limit.

[I know of a case where a cooling fan in a file server failed
overnight, and the staff turned up the following morning to find that
all 4 drives had failed after reaching temperatures of up to 63C.]

-- 
Glynn Clements <glynn@xxxxxxxxxxxxxxxxxx>
-
: send the line "unsubscribe linux-admin" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Newbie]     [Audio]     [Hams]     [Kernel Newbies]     [Util Linux NG]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Device Drivers]     [Samba]     [Video 4 Linux]     [Git]     [Fedora Users]

  Powered by Linux