First I have to say I don't really know. But I'm willing to take a couple of guesses. The messages refer to SCSI phases so this looks like a bus/driver issue to me. I don't think it is related to something like a kernel deadlock. The driver is reporting an error back to the kernel after timing out. In fact, it looks like the driver is complaining that the bus died meaning it isn't getting any responses from devices. Which leads me to think you might have a physical problem or perhaps the noise level has increased (new monitor?). It could also be that your devices are getting older. I suggest that you reseat your scsi card and check the cables to make sure they are all seated. Review the bus cabling. You should be using the proper cable type and layout. If you've kludged it anywhere, this is your system telling you its time to get the proper cables, layout and termination. Fast devices need fast cables. Use active termination. Make sure you don't have termination turned on any device if you are using a terminator. If you are using device termination on the last device, make sure it's the only device with termination turned on. Sanity check things, in other words. Some PC vendors ship 80 or 128 pin disks with a 68 pin adaptor which can slip loose, for example. The advice to split your bus into a dedicated disk bus and tape/cdrom bus is good advice. Disks are usually a faster SCSI type than tape and cdrom. When you mix device types, the bus slows down. Do not mix 8 bit and 16 bit bus devices (wide and narrow). The standard says this will work but implementation is spotty. Even on high end gear like a SUN $1200 SCSI card, I've seen problems. Sorry if this seems obvious. Hattie Rouge > -----Original Message----- > From: psyche-list-admin@xxxxxxxxxx > [mailto:psyche-list-admin@xxxxxxxxxx] On Behalf Of David I. Bell > Sent: Monday, June 02, 2003 11:45 PM > To: psyche-list@xxxxxxxxxx > Subject: SCSI Problem > > > Kind Readers, > > Troubles recently started with my system. I'm running RH 8.0 > (2.4.18-26.8.0smp) on a dual CPU P-II/200Mhz system. It's > been up and running for 5 months or so. I just noticed that > the system periodically hangs for several seconds (sometimes > up to a minute or more) with the disk activity light solidy > on. The system usually unblocks itself if I wait > patiently. I've attached some text from /var/log/messages below. > > Does anyone know what the problem might be? I have a SCSI > disk at SCSI ID 0, a tape device at SCSI ID 3, and a CD-ROM > at SCSI ID 5. The tape and the CD-ROM were not in use at the > time of the error. Do you think this might be a failing SCSI > card or is it a failing disk drive? Could it be an O/S > induced hang -- deadlock of some kind related to SMP? > > ============================================================== > Jun 2 14:14:59 igor kernel: scsi0:0:0:0: Attempting to queue > an ABORT message > Jun 2 14:14:59 igor kernel: scsi0: Dumping Card State in > Message-out phase, at SEQADDR 0x15f > Jun 2 14:14:59 igor kernel: ACCUM = 0xa0, SINDEX = 0x61, > DINDEX = 0xc0, ARG_2 = 0xf > Jun 2 14:14:59 igor kernel: HCNT = 0x0 SCBPTR = 0xf > Jun 2 14:14:59 igor kernel: SCSISEQ = 0x12, SBLKCTL = 0x0 > Jun 2 14:14:59 igor kernel: DFCNTRL = 0x4, DFSTATUS = 0x6d > Jun 2 14:14:59 igor kernel: LASTPHASE = 0xa0, SCSISIGI = > 0xb6, SXFRCTL0 = 0x88 > Jun 2 14:14:59 igor kernel: SSTAT0 = 0x7, SSTAT1 = 0x3 > Jun 2 14:14:59 igor kernel: STACK == 0xe4, 0xe4, 0x159, 0x189 > Jun 2 14:14:59 igor kernel: SCB count = 120 > Jun 2 14:14:59 igor kernel: Kernel NEXTQSCB = 43 > Jun 2 14:14:59 igor kernel: Card NEXTQSCB = 67 > Jun 2 14:14:59 igor kernel: QINFIFO entries: 67 51 > Jun 2 14:14:59 igor kernel: Waiting Queue entries: > Jun 2 14:14:59 igor kernel: Disconnected Queue entries: > Jun 2 14:14:59 igor kernel: QOUTFIFO entries: > Jun 2 14:14:59 igor kernel: Sequencer Free SCB List: 14 3 1 > 9 0 10 8 6 5 4 11 7 13 2 12 Jun 2 14:14:59 igor kernel: > Sequencer SCB Info: 0(c 0x68, s 0x7, l 0, t > 0xff) 1(c 0x68, s 0x7, l 0, t 0xff) 2(c 0x68, s 0x7, l 0, t > 0xff) 3(c 0x68, s 0x7, l 0, t 0xff) 4(c 0x68, s 0x7, l 0, t > 0xff) 5(c 0x68, s 0x7, l 0, t > 0xff) 6(c 0x68, s 0x7, l 0, t 0xff) 7(c 0x68, s 0x7, l 0, t > 0xff) 8(c 0x68, s 0x7, l 0, t 0xff) 9(c 0x68, s 0x7, l 0, t > 0xff) 10(c 0x68, s 0x7, l 0, t > 0xff) 11(c 0x68, s 0x7, l 0, t 0xff) 12(c 0x68, s 0x7, l 0, t > 0xff) 13(c 0x68, s 0x7, l 0, t 0xff) 14(c 0x0, s 0x57, l 0, t > 0xff) 15(c 0x0, s 0x57, l 0, t 0x3d) > Jun 2 14:14:59 igor kernel: Pending list: 51(c 0x68, s 0x7, > l 0), 67(c 0x68, s 0x7, l 0), 61(c 0x0, s 0x57, l 0) > Jun 2 14:14:59 igor kernel: Kernel Free SCB list: 31 49 59 > 10 8 33 11 9 20 1 24 41 16 3 5 35 25 39 46 4 48 0 26 38 58 45 > 44 22 17 21 15 29 40 6 55 12 14 30 28 32 7 54 19 56 42 62 37 > 34 13 18 52 63 50 53 119 27 36 23 60 57 2 47 112 113 114 115 > 108 109 110 111 104 105 106 107 100 101 102 103 96 97 98 99 > 92 93 94 95 88 89 90 91 84 85 86 87 80 81 82 83 76 77 78 79 > 72 73 74 75 68 69 70 71 64 65 66 118 117 116 > Jun 2 14:14:59 igor kernel: Untagged Q(5): 61 > Jun 2 14:14:59 igor kernel: DevQ(0:0:0): 0 waiting > Jun 2 14:14:59 igor kernel: DevQ(0:3:0): 0 waiting > Jun 2 14:14:59 igor kernel: DevQ(0:5:0): 0 waiting > Jun 2 14:14:59 igor kernel: scsi0:0:0:0: Cmd aborted from QINFIFO > Jun 2 14:15:00 igor kernel: aic7xxx_abort returns 0x2002 > Jun 2 14:15:00 igor kernel: scsi0:0:0:0: Attempting to queue > an ABORT message > Jun 2 14:15:00 igor kernel: scsi0:0:0:0: Command not found > Jun 2 14:15:00 igor kernel: aic7xxx_abort returns 0x2002 > Jun 2 14:15:00 igor kernel: scsi0:0:5:0: Attempting to queue > an ABORT message > Jun 2 14:15:00 igor kernel: scsi0:0:5:0: Command not found > Jun 2 14:15:00 igor kernel: aic7xxx_abort returns 0x2002 > > ============================================================== > Thanks in advance. > > -- David Bell > dibl@xxxxxxxxxxx > > > > -- > Psyche-list mailing list > Psyche-list@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/psyche> -list > -- Psyche-list mailing list Psyche-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/psyche-list