Just another follow up, after we swapped the SCSI host adapter, the storage seems to be working fine (no more read/write,access/mount error). But after a while (e.g : once or twice a in 2 days), eventhough it is working fine, we still got some error messages, which we guess is somehow similar to messages in the previous posts. We do not know how critical is this,but maybe you guys could give valuable advise or inputs (either we should also change the cables as well,etc). TIA. Cheers! -Ikmal Latest 'dmesg' excerpts, appeared once in last 2 days : scsi5:0:0:0: Attempting to abort cmd 0000010002fa5380: 0x28 0x0 0x0 0x0 0x1 0x3f 0x0 0x0 0x8 0x0 scsi5: At time of recovery, card was not paused
Dump Card State Begins <<<<<<<<<<<<<<<<<
scsi5: Dumping Card State at program address 0x5 Mode 0x33 Card was paused HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11] DFFSTAT[0x33] SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0] LASTPHASE[0x1] SCSISEQ0[0x0] SCSISEQ1[0x12] SEQCTL0[0x0] SEQINTCTL[0x0] SEQ_FLAGS[0xc0] SEQ_FLAGS2[0x0] SSTAT0[0x0] SSTAT1[0x8] SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0xc0] SIMODE1[0xa4] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0] LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0] SCB Count = 4 CMDS_PENDING = 1 LASTSCB 0xffff CURRSCB 0x2 NEXTSCB 0x0 qinstart = 59 qinfifonext = 59 QINFIFO: WAITING_TID_QUEUES: Pending list: 2 FIFO_USE[0x0] SCB_CONTROL[0x64] SCB_SCSIID[0x7] Total 1 Kernel Free SCB list: 3 1 0 Sequencer Complete DMA-inprog list: Sequencer Complete list: Sequencer DMA-Up and Complete list: scsi5: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89] SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] scsi5: FIFO1 Free, LONGJMP == 0x81d8, SCB 0x3 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89] SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 scsi5: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x52 scsi5: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0 SIMODE0[0xc] CCSCBCTL[0x0] scsi5: REG0 == 0xffff, SINDEX = 0x1e0, DINDEX = 0xe1 scsi5: SCBPTR == 0x3, SCB_NEXT == 0x2, SCB_NEXT2 == 0x2 CDB 28 0 0 80 19 7c STACK: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>> DevQ(0:0:0): 0 waiting (scsi5:A:0:0): Device is disconnected, re-queuing SCB Recovery code sleeping Recovery SCB completes Recovery code awake scsi5: Transmission error detected LQISTAT1[0x0] LASTPHASE[0x1] SCSISIGI[0x0] PERRDIAG[0x1]
Dump Card State Begins <<<<<<<<<<<<<<<<<
scsi5: Dumping Card State at program address 0x26 Mode 0x11 Card was paused HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11] DFFSTAT[0x33] SCSISIGI[0x1a] SCSIPHASE[0x1] SCSIBUS[0xff] LASTPHASE[0x1] SCSISEQ0[0x40] SCSISEQ1[0x12] SEQCTL0[0x0] SEQINTCTL[0x0] SEQ_FLAGS[0xc0] SEQ_FLAGS2[0x0] SSTAT0[0x10] SSTAT1[0x11] SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0x0] SIMODE1[0xac] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0] LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0] SCB Count = 4 CMDS_PENDING = 1 LASTSCB 0xffff CURRSCB 0x2 NEXTSCB 0x0 qinstart = 61 qinfifonext = 61 QINFIFO: WAITING_TID_QUEUES: 0 ( 0x2 ) Pending list: 2 FIFO_USE[0x0] SCB_CONTROL[0x50] SCB_SCSIID[0x7] Total 1 Kernel Free SCB list: 3 1 0 Sequencer Complete DMA-inprog list: Sequencer Complete list: Sequencer DMA-Up and Complete list: scsi5: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x0 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89] SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] SOFFCNT[0x1] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] scsi5: FIFO1 Free, LONGJMP == 0x81d8, SCB 0x3 SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89] SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] SOFFCNT[0x1] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] LQIN: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 scsi5: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x52 scsi5: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0 SIMODE0[0xc] CCSCBCTL[0x4] scsi5: REG0 == 0x3, SINDEX = 0x11d, DINDEX = 0xe1 scsi5: SCBPTR == 0x3, SCB_NEXT == 0x2, SCB_NEXT2 == 0x2 CDB 28 0 0 80 19 7c STACK: 0x13 0x0 0x0 0x0 0x0 0x0 0x0 0x0 <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>> DevQ(0:0:0): 0 waiting (scsi5:A:0): 80.000MB/s transfers (40.000MHz DT, 16bit) kjournald starting. Commit interval 5 seconds EXT3 FS on sdg1, internal journal EXT3-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 5 seconds EXT3 FS on sde1, internal journal EXT3-fs: mounted filesystem with ordered data mode. On 2/26/07, Hairul Ikmal Mohamad Fuzi <hairul.ikmal@xxxxxxxxx> wrote:
John, Vasiliy, Thanks for the input. We managed to figure out the problem after swapping all the items. It seems the SCSI host adapter is giving us the problem. Cheers. -Ikmal On 2/25/07, Vasiliy Boulytchev <vasiliy@xxxxxxxxxxxxxxxx> wrote: > Agreed, I was thinking of cables as well. > > See if you get better performance when you replace the cables :) > > Good luck > > John R Pierce wrote: > > > > Hairul Ikmal Mohamad Fuzi wrote: > >> Hi, > >> > >> Currently we are running CentOS 4.x on a 2-way Opteron machine. > >> This machine, through a SCSI host adapter (Adaptec), is connected to a > >> 2TB storage unit (an external RAID-5 disk array) > >> > >> Until our recent unintentional power trip, everything was fine and > >> smooth. > >> We have been experiencing complication accessing the storage ( it > >> could be either intermittent filesystem error, partition could not be > >> mounted in read-write mode, unacceptable writing speed, etc ), > >> especially when we start to 'write' on the storage. > >> > >> After a few check, we are suspecting either : > >> > >> 1) the storage unit (but the storage control panel did not report any > >> disk/raidset failure) is failing or, > >> 2) the SCSI host adapter is failing, or > >> 3) the filesystem itself is corrupted (we did 'fsck.ext3 -v -f' but it > >> turned out it did not find any errors) > > > > > > or 4) scsi cabling. I see some scsi transmission errors in there. > > About the only way I know to diagnose something like this would be to > > swap parts... I'd swap the controller card and see if the problems go > > away, then try the cable, then try the storage controller. if one of > > these things fixes the problem back the other changes out (ie put the > > original card back, etc). > > _______________________________________________ > > CentOS mailing list > > CentOS@xxxxxxxxxx > > http://lists.centos.org/mailman/listinfo/centos > _______________________________________________ > CentOS mailing list > CentOS@xxxxxxxxxx > http://lists.centos.org/mailman/listinfo/centos >
_______________________________________________ CentOS mailing list CentOS@xxxxxxxxxx http://lists.centos.org/mailman/listinfo/centos