Sam Varshavchik wrote:
Every other week or so, I get a disk kicked out of my RAID, with this:
Jul 6 04:05:38 commodore kernel: (scsi1:A:0:0): scsi1: device overrun
(status 10) on 0:0:0
Jul 6 04:05:38 commodore kernel: Unexpected busfree in DT Data-in
phase, 1 SCBs aborted, PRGMCNT == 0x22f
Jul 6 04:05:38 commodore kernel: >>>>>>>>>>>>>>>>>> Dump Card State
Begins <<<<<<<<<<<<<<<<<
Jul 6 04:05:38 commodore kernel: scsi1: Dumping Card State at program
address 0x22d Mode 0x22
Jul 6 04:05:38 commodore kernel: Card was paused
… followed by a rather dry dump of the HBA's registers. This is aic79xxx.
This does not look like a disk error to me. I re-add the drive into the
array, and rebuild with no downtime. SMART shows 0 in the defect list on
this drive, and over the disk's lifetime 0 uncorrectable reads and 1
uncorrectable write -- but this kernel barf already happened 4-5 times
now, and it's getting rather annoying.
Looks more like a controller problem than a drive problem. Do you have a spare
HBA to test?
-- Chris
--
fedora-list mailing list
fedora-list@xxxxxxxxxx
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list