--- On Thu, 3/20/08, James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote: > From: James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> > Subject: Re: aic94xx driver woes continued > To: "Raoul Bhatia [IPAX]" <r.bhatia@xxxxxxx> > Cc: linux-scsi@xxxxxxxxxxxxxxx > Date: Thursday, March 20, 2008, 12:01 PM > On Thu, 2008-03-20 at 19:43 +0100, Raoul Bhatia [IPAX] > wrote: > > hi there, > > > > we find ourself in the same situation as posted on > this list before [1] > > > > first of all, the hardware details: > > > > System: > > > Tyan Transport GT24-B3992 > > > Motherboard: Tyan B3992 > > > Dual Opteron 2218 (Dual-Core) > > > 8GB RAM > > > > SAS Controller: > > > product: AIC-9410W SAS (Razor ASIC RAID)=20 > > > vendor: Adaptec > > > > > controler-bios: BIOS present (1,1), 1820 > > > controler-sequencer: Firmware version 1.1 (V30) > > > > Harddisks: > > > 4x Seagate Cheetah 15K.5 ST373455SS > > > > There is a Software Raid10 on top of those 4 disks. > > > vanilla kernel 2.6.25-rc5 > > > Debian GNU/Linux 4.0, AMD64 > > > > > > coming to the problem description itself: > > > > the server is booted, the raid is working as intended > > > md4 : active raid10 sdb9[1] sda9[0] sdd9[3] > sdc9[2] > > > 100181120 blocks 64K chunks 2 near-copies > [4/4] [UUUU] > > > > now we mount /dev/md4 to /home, cd there and run an io > intensive task > > such as stress, tiobench (or even raid-reinit is > enough) > > > stress --hdd 20 --hdd-bytes 2gb --hdd-noclean > > > > soon we see: > > > aic94xx: escb_tasklet_complete: REQ_TASK_ABORT, > reason=0x6 > > > sas: command 0xffff81023fb2ca80, task > 0xffff81023ea7ab40, timed out: > > EH_NOT_HANDLED > > > ... > > > sas: Enter sas_scsi_recover_host > > > sas: trying to find task 0xffff81023ea7ab40 > > > sas: sas_scsi_find_task: aborting task > 0xffff81023ea7ab40 > > > ... > > > sas: --- Exit sas_scsi_recover_host > > > > please se the attached logfile. > > This is all normal. Seagate drives are known for throwing > protocol > errors under stress at certain revs of firmware. > That's what > REQ_TASK_ABORT, reason=0x6 is. Reason 6 just means a "Protocol Error", without access to the HW registers, sequencer and most importantly a protocol link trace of the problem for analysis, you cannot be sure whose fault it is and why. Luben -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html