I’m running centos
4.3 (2.6.9-22.ELsmp) on a box ,and running windows XP on vmware 5.5.1. I have 3 scsi disk
on this server,sda for linux system,sdb for vmware disk,and sdc for other. Resently I found
that something wrong with the second disk,the guest os windowsxp copying files from a samba
server(another box) to it’s disk,and for some time,maybe 5hours,3or1hours, the kernel said
that sdb is offline… then I reboot,everything
is ok,no filesystem check. but this problem
accours every a few hours when the windowsxp copying a lot of files to it’s
disk. I’m not sure if it
was the heavy load of the disk made this problem. then I run smartctl
to see if it’s overheat,but result nothing,smartctl said the temperature is ok
(27-29c). is it a hardware problem?
cable? disk is dying? kernel problem?vmware problem? and here is the
dmesg dump when this problem happening.. Nov 26 08:41:43
server kernel: device eth2 entered promiscuous mode Nov 26 08:41:43
server kernel: bridge-eth2: enabled promiscuous mode Nov 27 19:12:07
server kernel: scsi0:0:1:0: Attempting to abort cmd f7a97500: 0x2a 0x0 0xc 0x7b
0x41 0x30 0x0 0x0 0x68 0x0 Nov 27 19:12:07
server kernel: scsi0: At time of recovery, card was not paused Nov 27 19:12:07
server kernel:
>>>>>>>>>>>>>>>>>> Dump
Card State Begins
<<<<<<<<<<<<<<<<< Nov 27 19:12:07
server kernel: scsi0: Dumping Card State at program address 0x26 Mode 0x22 Nov 27 19:12:07
server kernel: Card was paused Nov 27 19:12:07
server kernel: HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11] Nov 27 19:12:07
server kernel: DFFSTAT[0x33] SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0] Nov 27 19:12:07
server kernel: LASTPHASE[0x1] SCSISEQ0[0x0] SCSISEQ1[0x12] SEQCTL0[0x0] Nov 27 19:12:07
server kernel: SEQINTCTL[0x0] SEQ_FLAGS[0x0] SEQ_FLAGS2[0x0] SSTAT0[0x0] Nov 27 19:12:07
server kernel: SSTAT1[0x0] SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0x0] Nov 27 19:12:07
server kernel: SIMODE1[0xa4] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0] Nov 27 19:12:07
server kernel: LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0xe1] Nov 27 19:12:07
server kernel: Nov 27 19:12:58
server kernel: SCB Count = 12 CMDS_PENDING = 4 LASTSCB 0x6 CURRSCB 0x3 NEXTSCB
0xff40 Nov 27 19:12:58
server kernel: qinstart = 23816 qinfifonext = 23816 Nov 27 19:12:58
server kernel: QINFIFO: Nov 27 19:12:58
server kernel: WAITING_TID_QUEUES: Nov 27 19:12:58
server kernel: Pending list: Nov 27 19:12:58
server kernel: 9 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x17] Nov 27 19:12:58
server kernel: 0 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x17] Nov 27 19:12:58
server kernel: 7 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x17] Nov 27 19:12:58
server kernel: 5 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x17] Nov 27 19:12:58
server kernel: Total 4 Nov 27 19:12:58
server kernel: Kernel Free SCB list: 3 6 11 2 4 1 10 8 Nov 27 19:12:58
server kernel: Sequencer Complete DMA-inprog list: Nov 27 19:12:58
server kernel: Sequencer Complete list: Nov 27 19:12:58
server kernel: Sequencer DMA-Up and Complete list: Nov 27 19:12:58
server kernel: Nov 27 19:12:58
server kernel: scsi0: FIFO0 Free, LONGJMP == 0x8252, SCB 0x3 Nov 27 19:12:58
server kernel: SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x4] DFSTATUS[0x89] Nov 27 19:12:58
server kernel: SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] Nov 27 19:12:58
server kernel: SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 Nov 27 19:12:58
server kernel: HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] Nov 27 19:12:58
server kernel: scsi0: FIFO1 Free, LONGJMP == 0x8063, SCB 0x3 Nov 27 19:12:58
server kernel: SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89] Nov 27 19:12:58
server kernel: SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] Nov 27 19:12:58
server kernel: SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 Nov 27 19:12:58
server kernel: HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] Nov 27 19:12:58
server kernel: LQIN: 0x8 0x0 0x0 0x3 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
0x0 0x0 0x0 0x0 0x0 0x0 Nov 27 19:12:58
server kernel: scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x52 Nov 27 19:12:58
server kernel: scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x1 Nov 27 19:12:58
server kernel: SIMODE0[0xc] Nov 27 19:12:58
server kernel: CCSCBCTL[0x0] Nov 27 19:12:58
server kernel: scsi0: REG0 == 0x3, SINDEX = 0x102, DINDEX = 0x102 Nov 27 19:12:58 server
kernel: scsi0: SCBPTR == 0x3, SCB_NEXT == 0xff40, SCB_NEXT2 == 0xff86 Nov 27 19:12:58
server kernel: CDB 2a 0 1 80 8 6c Nov 27 19:12:58
server kernel: STACK: 0x14 0x0 0x0 0x0 0x0 0x0 0x0 0x0 Nov 27 19:12:58
server kernel:
<<<<<<<<<<<<<<<<< Dump Card
State Ends
>>>>>>>>>>>>>>>>>> Nov 27 19:12:58
server kernel: DevQ(0:0:0): 0 waiting Nov 27 19:12:58
server kernel: DevQ(0:1:0): 0 waiting Nov 27 19:12:58
server kernel: DevQ(0:2:0): 0 waiting Nov 27 19:12:58
server kernel: (scsi0:A:1:0): Device is disconnected, re-queuing SCB Nov 27 19:12:58
server kernel: Recovery code sleeping Nov 27 19:12:58
server kernel: (scsi0:A:1:0): Task Management Func 0x1 Complete Nov 27 19:12:58
server kernel: Recovery SCB completes Nov 27 19:12:58
server kernel: Recovery code awake Nov 27 19:12:58
server kernel: scsi0:0:1:0: Attempting to abort cmd f7a97500: 0x0 0x0 0x0 0x0
0x0 0x0 Nov 27 19:12:58
server kernel: scsi0: At time of recovery, card was not paused Nov 27 19:12:58
server kernel:
>>>>>>>>>>>>>>>>>> Dump
Card State Begins
<<<<<<<<<<<<<<<<< Nov 27 19:12:58
server kernel: scsi0: Dumping Card State at program address 0x24 Mode 0x0 Nov 27 19:12:58
server kernel: Card was paused Nov 27 19:12:58
server kernel: HS_MAILBOX[0x0] INTCTL[0x80] SEQINTSTAT[0x0] SAVED_MODE[0x11] Nov 27 19:12:58
server kernel: DFFSTAT[0x33] SCSISIGI[0x0] SCSIPHASE[0x0] SCSIBUS[0x0] Nov 27 19:12:58
server kernel: LASTPHASE[0x1] SCSISEQ0[0x0] SCSISEQ1[0x12] SEQCTL0[0x0] Nov 27 19:12:58
server kernel: SEQINTCTL[0x0] SEQ_FLAGS[0x0] SEQ_FLAGS2[0x0] SSTAT0[0x0] Nov 27 19:12:58
server kernel: SSTAT1[0x8] SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0x0] Nov 27 19:12:58
server kernel: SIMODE1[0xa4] LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0] Nov 27 19:12:58
server kernel: LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0xe1] Nov 27 19:12:58
server kernel: Nov 27 19:12:58
server kernel: SCB Count = 12 CMDS_PENDING = 4 LASTSCB 0x6 CURRSCB 0x5 NEXTSCB
0xffc0 Nov 27 19:12:58
server kernel: qinstart = 23818 qinfifonext = 23818 Nov 27 19:12:58
server kernel: QINFIFO: Nov 27 19:12:58
server kernel: WAITING_TID_QUEUES: Nov 27 19:12:58
server kernel: Pending list: Nov 27 19:12:58
server kernel: 5 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x17] Nov 27 19:12:58
server kernel: 9 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x17] Nov 27 19:12:58
server kernel: 0 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x17] Nov 27 19:12:58
server kernel: 7 FIFO_USE[0x0] SCB_CONTROL[0x60] SCB_SCSIID[0x17] Nov 27 19:12:58
server kernel: Total 4 Nov 27 19:12:58
server kernel: Kernel Free SCB list: 3 6 11 2 4 1 10 8 Nov 27 19:12:58
server kernel: Sequencer Complete DMA-inprog list: Nov 27 19:12:58
server kernel: Sequencer Complete list: Nov 27 19:12:58
server kernel: Sequencer DMA-Up and Complete list: Nov 27 19:12:58
server kernel: Nov 27 19:12:58
server kernel: scsi0: FIFO0 Free, LONGJMP == 0x8252, SCB 0x3 Nov 27 19:12:58
server kernel: SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x4] DFSTATUS[0x89] Nov 27 19:12:58
server kernel: SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] Nov 27 19:12:58
server kernel: SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 Nov 27 19:12:58
server kernel: HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] Nov 27 19:12:58
server kernel: scsi0: FIFO1 Free, LONGJMP == 0x8063, SCB 0x3 Nov 27 19:12:58
server kernel: SEQIMODE[0x3f] SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89] Nov 27 19:12:58
server kernel: SG_CACHE_SHADOW[0x2] SG_STATE[0x0] DFFSXFRCTL[0x0] Nov 27 19:12:58
server kernel: SOFFCNT[0x0] MDFFSTAT[0x5] SHADDR = 0x00, SHCNT = 0x0 Nov 27 19:12:58
server kernel: HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10] Nov 27 19:12:58
server kernel: LQIN: 0x8 0x0 0x0 0x3 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
0x0 0x0 0x0 0x0 0x0 0x0 Nov 27 19:12:58
server kernel: scsi0: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x52 Nov 27 19:12:58
server kernel: scsi0: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x1 Nov 27 19:12:58
server kernel: SIMODE0[0xc] Nov 27 19:12:58
server kernel: CCSCBCTL[0x4] Nov 27 19:12:59 server
kernel: scsi0: REG0 == 0x6b60, SINDEX = 0x104, DINDEX = 0x104 。。。。。。 Nov 27 19:12:59
server kernel: DevQ(0:0:0): 0 waiting Nov 27 19:12:59
server kernel: DevQ(0:1:0): 0 waiting Nov 27 19:12:59
server kernel: DevQ(0:2:0): 0 waiting Nov 27 19:12:59
server kernel: (scsi0:A:1:0): Device is disconnected, re-queuing SCB Nov 27 19:12:59
server kernel: Recovery code sleeping Nov 27 19:12:59
server kernel: (scsi0:A:1:0): Abort Tag Message Sent Nov 27 19:12:59
server kernel: (scsi0:A:1:0): SCB 5 - Abort Completed. Nov 27 19:12:59
server kernel: Recovery SCB completes Nov 27 19:12:59
server kernel: found == 0x1 Nov 27 19:12:59
server kernel: Recovery code awake Nov 27 19:12:59
server kernel: Recovery code sleeping Nov 27 19:12:59
server kernel: (scsi0:A:1:0): Bus Device Reset Message Sent Nov 27 19:12:59
server kernel: Recovery SCB completes Nov 27 19:12:59
server kernel: scsi0: Bus Device Reset on A:1. 1 SCBs aborted Nov 27 19:12:59
server kernel: Recovery code awake Nov 27 19:12:59
server kernel: scsi0: Device reset returning 0x2002 Nov 27 19:12:59
server kernel: scsi: Device offlined - not ready after error recovery: host 0
channel 0 id 1 lun 0 Nov 27 19:12:59
server kernel: SCSI error : <0 0 1 0> return code = 0x10000 Nov 27 19:12:59
server kernel: end_request: I/O error, dev sdb, sector 209404208 Nov 27 19:12:59
server kernel: Buffer I/O error on device sdb2, logical block 10110526 Nov 27 19:12:59
server kernel: lost page write due to I/O error on sdb2 Nov 27 19:12:59
server kernel: scsi0 (1:0): rejecting I/O to offline device Nov 27 19:12:59
server kernel: Buffer I/O error on device sdb2, logical block 10110527 Nov 27 19:12:59
server kernel: lost page write due to I/O error on sdb2 Nov 27 19:12:59
server kernel: Buffer I/O error on device sdb2, logical block 10110528 Nov 27 19:12:59
server kernel: lost page write due to I/O error on sdb2 Nov 27 19:12:59
server kernel: Buffer I/O error on device sdb2, logical block 10110529 Nov 27 19:12:59
server kernel: lost page write due to I/O error on sdb2 Nov 27 19:12:59
server kernel: Buffer I/O error on device sdb2, logical block 10110530 Nov 27 19:12:59
server kernel: lost page write due to I/O error on sdb2 Nov 27 19:12:59
server kernel: Buffer I/O error on device sdb2, logical block 10110531 Nov 27 19:12:59
server kernel: lost page write due to I/O error on sdb2 Nov 27 19:12:59
server kernel: Buffer I/O error on device sdb2, logical block 10110532 Nov 27 19:12:59
server kernel: lost page write due to I/O error on sdb2 Nov 27 19:12:59
server kernel: Buffer I/O error on device sdb2, logical block 10110533 Nov 27 19:12:59
server kernel: lost page write due to I/O error on sdb2 Nov 27 19:12:59
server kernel: Buffer I/O error on device sdb2, logical block 10110534 Nov 27 19:12:59
server kernel: lost page write due to I/O error on sdb2 Nov 27 19:12:59
server kernel: Buffer I/O error on device sdb2, logical block 10110535 Nov 27 19:12:59
server kernel: lost page write due to I/O error on sdb2 Nov 27 19:12:59
server kernel: scsi0 (1:0): rejecting I/O to offline device Nov 27 19:12:59
server kernel: Aborting journal on device sdb2. Nov 27 19:12:59
server kernel: scsi0 (1:0): rejecting I/O to offline device Nov 27 19:12:59
server kernel: ext3_abort called. Nov 27 19:12:59
server kernel: EXT3-fs error (device sdb2): ext3_journal_start_sb: Detected
aborted journal Nov 27 19:12:59
server kernel: Remounting filesystem read-only Nov 27 19:12:59
server kernel: scsi0 (1:0): rejecting I/O to offline device Nov 27 19:12:59
server kernel: SCSI error : <0 0 1 0> return code = 0x10000 Nov 27 19:12:59
server kernel: end_request: I/O error, dev sdb, sector 209368912 Nov 27 19:12:59
server kernel: scsi0 (1:0): rejecting I/O to offline device Nov 27 19:12:59
server kernel: SCSI error : <0 0 1 0> return code = 0x10000 Nov 27 19:12:59
server kernel: end_request: I/O error, dev sdb, sector 209369760 Nov 27 19:12:59
server kernel: scsi0 (1:0): rejecting I/O to offline device Nov 27 19:12:59
server kernel: SCSI error : <0 0 1 0> return code = 0x10000 Nov 27 19:12:59
server kernel: end_request: I/O error, dev sdb, sector 209370072 Nov 27 19:12:59
server kernel: scsi0 (1:0): rejecting I/O to offline device Nov 27 19:13:29
server kernel: scsi0 (1:0): rejecting I/O to offline device Nov 27 19:13:29
server kernel: printk: 5631 messages suppressed. Nov 27 19:13:29
server kernel: Buffer I/O error on device sdb2, logical block 9928706 Nov 27 19:13:29
server kernel: lost page write due to I/O error on sdb2 Nov 27 19:18:54
server kernel: device eth2 left promiscuous mode Nov 27 19:18:54
server kernel: bridge-eth2: disabled promiscuous mode Nov 27 19:18:54
server kernel: device eth1 left promiscuous mode Nov 27 19:18:54
server kernel: bridge-eth1: disabled promiscuous mode Nov 27 19:18:54
server kernel: scsi0 (1:0): rejecting I/O to offline device Nov 27 19:18:54
server kernel: EXT3-fs error (device sdb2): ext3_find_entry: reading directory
#4964353 offset 0 Nov 27 19:18:54
server kernel: Nov 27 19:20:19
server kernel: scsi0 (1:0): rejecting I/O to offline device Nov 27 19:20:19
server kernel: Buffer I/O error on device sdb2, logical block 6 Nov 27 19:20:19
server kernel: Buffer I/O error on device sdb2, logical block 7 Nov 27 19:20:19
server kernel: Buffer I/O error on device sdb2, logical block 8 Nov 27 19:20:19
server kernel: Buffer I/O error on device sdb2, logical block 9 Nov 27 19:20:19
server kernel: Buffer I/O error on device sdb2, logical block 10 Nov 27 19:20:19
server kernel: Buffer I/O error on device sdb2, logical block 11 Nov 27 19:20:19
server kernel: Buffer I/O error on device sdb2, logical block 12 Nov 27 19:20:19
server kernel: Buffer I/O error on device sdb2, logical block 13 Nov 27 19:20:19
server kernel: Buffer I/O error on device sdb2, logical block 14 Nov 27 19:20:19
server kernel: Buffer I/O error on device sdb2, logical block 15 Nov 27 19:22:39
server kernel: scsi0 (1:0): rejecting I/O to offline device Nov 27 19:22:39
server kernel: Buffer I/O error on device sdb1, logical block 5 Nov 27 19:22:39
server kernel: Buffer I/O error on device sdb1, logical block 6 Nov 27 19:22:39
server kernel: Buffer I/O error on device sdb1, logical block 7 Nov 27 19:22:39
server kernel: Buffer I/O error on device sdb1, logical block 8 Nov 27 19:22:39
server kernel: Buffer I/O error on device sdb1, logical block 9 Nov 27 19:22:39
server kernel: Buffer I/O error on device sdb1, logical block 10 Nov 27 19:22:39
server kernel: Buffer I/O error on device sdb1, logical block 11 Nov 27 19:22:39
server kernel: Buffer I/O error on device sdb1, logical block 12 Nov 27 19:22:39
server kernel: Buffer I/O error on device sdb1, logical block 13 Nov 27 19:22:39
server kernel: Buffer I/O error on device sdb1, logical block 14 Nov 27 19:22:39
server kernel: Buffer I/O error on device sdb1, logical block 15 Nov 27 19:22:39
server kernel: scsi0 (1:0): rejecting I/O to offline device Nov 27 19:22:39
server kernel: Buffer I/O error on device sdb2, logical block 1033 Nov 27 19:22:39
server kernel: Buffer I/O error on device sdb2, logical block 1034 Nov 27 19:22:39
server kernel: Buffer I/O error on device sdb2, logical block 1035 Nov 27 19:22:39
server kernel: Buffer I/O error on device sdb2, logical block 1036 Nov 27 19:22:39
server kernel: Buffer I/O error on device sdb2, logical block 1037 Nov 27 19:22:39
server kernel: Buffer I/O error on device sdb2, logical block 1038 Nov 27 19:22:39
server kernel: Buffer I/O error on device sdb2, logical block 1039 Nov 27 19:22:39
server kernel: Buffer I/O error on device sdb2, logical block 1040 Nov 27 19:22:39
server kernel: Buffer I/O error on device sdb2, logical block 1041 Nov 27 19:22:39
server kernel: Buffer I/O error on device sdb2, logical block 1042 Nov 27 19:22:39
server kernel: scsi0 (1:0): rejecting I/O to offline device Nov 27 19:22:39
server kernel: Buffer I/O error on device sdb2, logical block 1551 Nov 27 19:22:39
server kernel: scsi0 (1:0): rejecting I/O to offline device Nov 27 19:22:39
server kernel: Buffer I/O error on device sdb2, logical block 1554 and the smartctl
output [root@server ~]#
smartctl -a /dev/sdb smartctl version
5.33 [i686-redhat-linux-gnu] Copyright (C) 2002-4 Bruce Allen Home page is
http://smartmontools.sourceforge.net/ Device:
SEAGATE ST3146807LC Version: 0007 Serial number:
3HY8YZNY00007613S1HZ Device type: disk Transport protocol:
Parallel SCSI (SPI-4) Local Time is: Wed
Nov 29 19:14:25 2006 CST Device supports
SMART and is Enabled Temperature Warning
Enabled SMART Health
Status: OK Current Drive
Temperature: 29 C Drive Trip
Temperature: 68 C Vendor (Seagate)
cache information Blocks sent
to initiator = 40668853 Blocks
received from initiator = 2255578307 Blocks read
from cache and sent to initiator = 15359133 Number of
read and write commands whose size <= segment size = 22149142 Number of
read and write commands whose size > segment size = 1651834 Vendor
(Seagate/Hitachi) factory information number of
hours powered up = 2001.53 number of
minutes until next internal SMART test = 28 Error counter log:
Errors Corrected by
Total Correction
Gigabytes Total
EEC
rereads/ errors
algorithm processed uncorrected
fast | delayed rewrites corrected
invocations [10^9 bytes] errors read:
5754 1
0 5755
5987
165.985 0 write:
0
0
7
7 15847
69.478 0 Non-medium error
count: 6491 Error Events
logging not supported [GLTSD (Global
Logging Target Save Disable) set. Enable Save with '-S on'] SMART Self-test log Num
Test
Status
segment LifeTime LBA_first_err [SK ASC ASQ]
Description
number (hours) #
1 Background long Failed in segment --> -
2001 0x c83206c [0x3 0x11
0x0] # 2
Background short
Completed
-
2
- [- - -] # 3
Background short
Completed
-
2
- [- - -] Long (extended)
Self Test duration: 3072 seconds [51.2 minutes] I’m running a long selftest
of this disk. thanks! |
_______________________________________________ CentOS mailing list CentOS@xxxxxxxxxx http://lists.centos.org/mailman/listinfo/centos