Alfred von Campe wrote:
It's time to resurrect this thread from way back in June. The problem in the subject line has reared its ugly head again, but this time with a twist that makes it much worse. A little refresher on what was happening back then. Every so often the root file system would be remounted read-only, with the error in the subject line appearing over and over again on the console.Lately, this has been happening every 10-14 days, and I would have to reboot my system. Since the root file system was not writable, no error messages were logged in /var/log/messages. So I configured syslog to write messages to another system as well, and this time I have captured some errors (see below). BTW, this is a SATA drive.What makes it much worse this time, is that the system won't boot! When I try to boot now I get the following error over and over again:ata1: translated ATA stat/err 0x51/40 to SCSI SK/ASC/ASCQ 0x3/11/04 HELP! Is there anything I can do to recover this system? AlfredHere are the first 50 lines from /var/log/messages (including the first occurrence of the error in the subject line)Aug 1 18:57:04 balboa01 kernel: ata1: command 0x35 timeout, stat 0xb7 host_stat 0x21 Aug 1 18:57:04 balboa01 kernel: ata1: translated ATA stat/err 0xb7/00 to SCSI SK/ASC/ASCQ 0xb/47/00Aug 1 18:57:04 balboa01 kernel: ata1: status=0xb7 { Busy }Aug 1 18:57:04 balboa01 kernel: SCSI error : <0 0 0 0> return code = 0x8000002Aug 1 18:57:04 balboa01 kernel: Current sda: sense key Aborted Command Aug 1 18:57:04 balboa01 kernel: Additional sense: Scsi parity errorAug 1 18:57:04 balboa01 kernel: end_request: I/O error, dev sda, sector 224365Aug 1 18:57:04 balboa01 kernel: ATA: abnormal status 0xB7 on port 0x1F7 Aug 1 18:57:04 balboa01 last message repeated 2 timesAug 1 18:57:04 balboa01 kernel: ata1: command 0x35 timeout, stat 0xb7 host_stat 0x21 Aug 1 18:57:04 balboa01 kernel: ata1: translated ATA stat/err 0xb7/00 to SCSI SK/ASC/ASCQ 0xb/47/00Aug 1 18:57:04 balboa01 kernel: ata1: status=0xb7 { Busy }Aug 1 18:57:04 balboa01 kernel: SCSI error : <0 0 0 0> return code = 0x8000002Aug 1 18:57:04 balboa01 kernel: Current sda: sense key Aborted Command Aug 1 18:57:04 balboa01 kernel: Additional sense: Scsi parity errorAug 1 18:57:04 balboa01 kernel: end_request: I/O error, dev sda, sector 233795925 Aug 1 18:57:04 balboa01 kernel: Buffer I/O error on device dm-0, logical block 29198337Aug 1 18:57:04 balboa01 kernel: lost page write due to I/O error on dm-0 Aug 1 18:57:04 balboa01 kernel: ATA: abnormal status 0xB7 on port 0x1F7 Aug 1 18:57:04 balboa01 last message repeated 2 timesAug 1 18:57:04 balboa01 kernel: ata1: command 0x35 timeout, stat 0xb7 host_stat 0x21 Aug 1 18:57:04 balboa01 kernel: ata1: translated ATA stat/err 0xb7/00 to SCSI SK/ASC/ASCQ 0xb/47/00Aug 1 18:57:04 balboa01 kernel: ata1: status=0xb7 { Busy }Aug 1 18:57:04 balboa01 kernel: SCSI error : <0 0 0 0> return code = 0x8000002Aug 1 18:57:04 balboa01 kernel: Current sda: sense key Aborted Command Aug 1 18:57:04 balboa01 kernel: Additional sense: Scsi parity errorAug 1 18:57:04 balboa01 kernel: end_request: I/O error, dev sda, sector 224373 Aug 1 18:57:04 balboa01 kernel: Buffer I/O error on device dm-0, logical block 1893Aug 1 18:57:04 balboa01 kernel: lost page write due to I/O error on dm-0 Aug 1 18:57:04 balboa01 kernel: ATA: abnormal status 0xB7 on port 0x1F7 Aug 1 18:57:04 balboa01 last message repeated 2 times Aug 1 18:57:04 balboa01 kernel: Aborting journal on device dm-0.Aug 1 18:57:04 balboa01 kernel: ata1: command 0x35 timeout, stat 0xb7 host_stat 0x21 Aug 1 18:57:04 balboa01 kernel: ata1: translated ATA stat/err 0xb7/00 to SCSI SK/ASC/ASCQ 0xb/47/00Aug 1 18:57:04 balboa01 kernel: ata1: status=0xb7 { Busy }Aug 1 18:57:04 balboa01 kernel: SCSI error : <0 0 0 0> return code = 0x8000002Aug 1 18:57:04 balboa01 kernel: Current sda: sense key Aborted Command Aug 1 18:57:04 balboa01 kernel: Additional sense: Scsi parity errorAug 1 18:57:04 balboa01 kernel: end_request: I/O error, dev sda, sector 172585309 Aug 1 18:57:04 balboa01 kernel: Buffer I/O error on device dm-0, logical block 21547010Aug 1 18:57:04 balboa01 kernel: lost page write due to I/O error on dm-0 Aug 1 18:57:04 balboa01 kernel: ATA: abnormal status 0xB7 on port 0x1F7 Aug 1 18:57:04 balboa01 last message repeated 2 times Aug 1 18:57:04 balboa01 kernel: ext3_abort called.Aug 1 18:57:04 balboa01 kernel: EXT3-fs error (device dm-0): ext3_journal_start_sb: Detected aborted journalAug 1 18:57:04 balboa01 kernel: Remounting filesystem read-onlyAug 1 18:57:04 balboa01 kernel: ata1: command 0x35 timeout, stat 0xb7 host_stat 0x21 Aug 1 18:57:04 balboa01 kernel: EXT3-fs error (device dm-0) in start_transaction: Journal has aborted Aug 1 18:57:34 balboa01 kernel: ata1: command 0x35 timeout, stat 0xb7 host_stat 0x21 Aug 1 18:57:34 balboa01 kernel: ata1: translated ATA stat/err 0xb7/00 to SCSI SK/ASC/ASCQ 0xb/47/00_______________________________________________ CentOS mailing list CentOS@xxxxxxxxxx http://lists.centos.org/mailman/listinfo/centos
Maybe the disk is dying? Did you run smartd (it requires -d ata for SATA disks; this option needs to be put in smartd.conf)?
The error messages could also indicate bad cables.I would boot from the CentOS 4.3 Live-CD, and take a look at the disk with smartctl. If the disk is indeed dying, I'd try to save its contents to a fresh disk, using ddrescue. Unfortunately there are 2 programs with this name (http://www.garloff.de/kurt/linux/ddrescue/ and http://www.gnu.org/software/ddrescue/ddrescue.html); I have very good results with the latter - don't know if it's on the LiveCD (if not, it should!).
If the disk shows no SMART errors you could use e2fsck. HTH, Kay
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ CentOS mailing list CentOS@xxxxxxxxxx http://lists.centos.org/mailman/listinfo/centos