I see the same behavior (you didn't say what error messages were in your log) -- I get a random disk that pops out every once in a while. Mine's fibre channel though with a qlogic controller. I don't even have to reboot anymore -- just remove it and re-add it to the raid set. Here's the last: Mar 15 10:20:49 yeti kernel: SCSI disk error : host 5 channel 0 id 0 lun 0 return code = 28000002 Mar 15 10:20:49 yeti kernel: Current sd41:01: sense key Hardware Error Mar 15 10:20:49 yeti kernel: Additional sense indicates Internal target failure Mar 15 10:20:49 yeti kernel: I/O error: dev 41:01, sector 8696056 Mar 15 10:20:49 yeti kernel: raid5: Disk failure on sdq1, disabling device. Operation continuing on 6 devices Here' s my history (I just added two more SCSI card so the channel on this set has moved from 3 to 5 now). Also, I should mention that I don't see these problems on my fibre-channel set. Mar 15 10:20:49 yeti kernel: SCSI disk error : host 5 channel 0 id 0 lun 0 return code = 28000002 Nov 24 20:04:28 medusa kernel: SCSI disk error : host 2 channel 0 id 10 lun 0 return code = 10000 Nov 5 09:01:15 yeti kernel: SCSI disk error : host 3 channel 0 id 6 lun 0 return code = 28000002 Aug 14 13:47:57 yeti kernel: SCSI disk error : host 3 channel 0 id 1 lun 0 return code = 28000002 Aug 4 17:17:00 yeti kernel: SCSI disk error : host 3 channel 0 id 0 lun 0 return code = 28000002 Jul 29 08:09:29 yeti kernel: SCSI disk error : host 3 channel 0 id 4 lun 0 return code = 28000002 ________________________________________ Michael D. Black Principal Engineer mblack@csihq.com 321-676-2923,x203 http://www.csihq.com Computer Science Innovations http://www.csihq.com/~mike My home page FAX 321-676-2355 ----- Original Message ----- From: "Justin" <jb@dslreports.com> To: <linux-raid@vger.kernel.org> Sent: Tuesday, March 19, 2002 6:18 PM Subject: Re: data corruption - the nightmare continues FWIW i get the same thing .. some of my raid1 arrays tend to become U_ after a few months of light use. Rebooting the box allows the device to be addressable again, and the disk is not, in fact, bad .. I can do a complete dd to the "bad" disk without error, then raidhotadd it back in again as well. A few months later of uptime, it is U_ again.. On an example box where this happens, the kernel is SMP 2.4.2 the controller is motherboard Adaptec 7896, the driver is aic7xxx the disks are ultra lvds, the cables and disk mounts are all by intel so I do not suspect a termination or cabling issue. The motherboard is 440GX. I am curious to see whether my other boxes which are 2.4.18 SMP will be more stable. -Justin On Tue, Mar 19, 2002 at 11:58:46PM +0100, Marcel wrote: > Rainer Fuegenstein wrote: > > > > Additional sense indicates Unrecovered read error > > I/O error: dev 08:19, sector 12850360 > > <big snip> > > You should enable verbose SCSI error reporting in the kernel. It's a > compile time kernel option. This will tell you more about what's going > on in the disk subsystem. > > The above error message is not enough and if it's all you get, even with > verbose error reporting enabled, you should talk to people more familiar > with the SCSI drivers. Meanwhile double-check whether SCSI bus > termination is done "by the book". Failure to do so can also cause some > nasty intermittent problems. > > Marcel > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html