Instability with later 4.x kernels?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]



I have an Athlon with about 10 HDDs plugged in, primarily to do Disk2Disk backups. Some drives are PATA, some are SATA, some are USB. A strange concoction, but it's been relatively stable for some 4-5 years, despite numerous upgrades and so on. It's been running CentOS 4 for a long, long time. (years)


Recently, I've started to have problems with its stability, and after 2 weeks of swapping hardware, found that using an earlier kernel restores its stability!


It takes a few days to determine if anything "goes south", so debugging is very, very slow. But I get random read errors, either SCSI errors or (a few times) HDA read errors.


Once the read errors begin, the system becomes very unresponsive, and often won't restart, even though I wait for hours, without my hitting the "kill switch".


# uname -a
Linux backuphost 2.6.9-67.0.22.EL #1 Wed Jul 23 17:17:45 EDT 2008 i686 athlon i386 GNU/Linux


The failures occur on all /dev/sd* devices, even those that are USB. Once, /dev/hdc had a similar problem after /dev/sdb had failed. Don't know if the mapping below helps?


/dev/hda - PATA, on motherboard, 20 GB.
/dev/hdb - IDE CDROM
/dev/hdc - on motherboard 500 GB IDE
/dev/hdd - on motherboard 300 GB IDE
/dev/hde - on PCI card, 500 GB IDE
/dev/sda - SATA, on a PCI card, 1 TB
/dev/sdb - SATA, on a PCI card 1 TB
/dev/sdc - USB on a USB 2.0 PCI card, 750 GB
/dev/sde - USB on a USB 2.0 PCI card, 750 GB
/dev/sdf - USB on a USB 2.0 PCI card, 1 TB



Here's what I see in the /var/log/messages:


May 27 05:08:42 hume ntpd[4844]: kernel time sync enabled 0001
May 27 08:01:01 hume kernel: SCSI error : <0 0 0 0> return code = 0x40000
May 27 08:01:01 hume kernel: end_request: I/O error, dev sda, sector 12847
May 27 08:01:01 hume kernel: EXT3-fs error (device sda1): ext3_find_entry: reading directory #2 offset 0
May 27 08:01:01 hume kernel:
May 27 08:14:27 hume kernel: SCSI error : <0 0 0 0> return code = 0x40000
May 27 08:14:27 hume kernel: end_request: I/O error, dev sda, sector 12847
May 27 08:14:27 hume kernel: EXT3-fs error (device sda1): ext3_find_entry: reading directory #2 offset 0
May 27 08:14:27 hume kernel:
May 27 10:28:30 hume ntpd[4844]: synchronized to 63.240.161.99, stratum 2
May 27 11:48:07 hume sshd(pam_unix)[26873]: session opened for user root by (uid=0)
May 27 11:48:10 hume kernel: SCSI error : <0 0 0 0> return code = 0x40000
May 27 11:48:10 hume kernel: end_request: I/O error, dev sda, sector 12847
May 27 11:48:10 hume kernel: EXT3-fs error (device sda1): ext3_find_entry: reading directory #2 offset 0
May 27 11:48:10 hume kernel:
May 27 11:48:16 hume kernel: SCSI error : <0 0 0 0> return code = 0x40000
May 27 11:48:16 hume kernel: end_request: I/O error, dev sda, sector 12847
May 27 11:48:16 hume kernel: EXT3-fs error (device sda1): ext3_readdir: directory #2 contains a hole at offset 0
May 27 11:48:23 hume kernel: SCSI error : <0 0 0 0> return code = 0x40000
May 27 11:48:23 hume kernel: end_request: I/O error, dev sda, sector 12847
May 27 11:48:23 hume kernel: EXT3-fs error (device sda1): ext3_readdir: directory #2 contains a hole at offset 0
May 27 11:48:24 hume kernel: SCSI error : <0 0 0 0> return code = 0x40000
May 27 11:48:24 hume kernel: end_request: I/O error, dev sda, sector 12847
May 27 11:48:24 hume kernel: EXT3-fs error (device sda1): ext3_readdir: directory #2 contains a hole at offset 0
May 27 11:48:38 hume kernel: SCSI error : <0 0 0 0> return code = 0x40000
May 27 11:48:38 hume kernel: end_request: I/O error, dev sda, sector 0
May 27 11:48:38 hume kernel: Buffer I/O error on device sda, logical block 0
May 27 11:48:38 hume kernel: SCSI error : <0 0 0 0> return code = 0x40000
May 27 11:48:38 hume kernel: end_request: I/O error, dev sda, sector 8
May 27 11:48:38 hume kernel: Buffer I/O error on device sda, logical block 1
May 27 11:48:38 hume kernel: SCSI error : <0 0 0 0> return code = 0x40000
May 27 11:48:38 hume kernel: end_request: I/O error, dev sda, sector 16
May 27 11:48:38 hume kernel: Buffer I/O error on device sda, logical block 2
May 27 11:48:38 hume kernel: SCSI error : <0 0 0 0> return code = 0x40000
May 27 11:48:38 hume kernel: end_request: I/O error, dev sda, sector 24
May 27 11:48:38 hume kernel: Buffer I/O error on device sda, logical block 3
May 27 11:48:38 hume kernel: SCSI error : <0 0 0 0> return code = 0x40000
May 27 11:48:38 hume kernel: end_request: I/O error, dev sda, sector 32
May 27 11:48:38 hume kernel: Buffer I/O error on device sda, logical block 4
May 27 11:48:38 hume kernel: SCSI error : <0 0 0 0> return code = 0x40000
May 27 11:48:38 hume kernel: end_request: I/O error, dev sda, sector 40
May 27 11:48:38 hume kernel: Buffer I/O error on device sda, logical block 5
May 27 11:48:38 hume kernel: SCSI error : <0 0 0 0> return code = 0x40000
May 27 11:48:38 hume kernel: end_request: I/O error, dev sda, sector 48
May 27 11:48:38 hume kernel: Buffer I/O error on device sda, logical block 6
.. MANY MEGABYTES OF STUFF LIKE THIS ..


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos

[Index of Archives]     [CentOS]     [CentOS Announce]     [CentOS Development]     [CentOS ARM Devel]     [CentOS Docs]     [CentOS Virtualization]     [Carrier Grade Linux]     [Linux Media]     [Asterisk]     [DCCP]     [Netdev]     [Xorg]     [Linux USB]
  Powered by Linux