mptsas and ioc0: ERRORs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hallo,

my name is Lars and I'm working for the IT of a german academy.
We recently bought some expensive equipment to build up a SAN with Linux.

If this is the wrong address to ask excuse me please. (Where to ask instead?)

The hardware is the following:

monosan:~ # cat /etc/SuSE-release 
openSUSE 10.3 (X86-64)
VERSION = 10.3
monosan:~ # uname -a
Linux monosan 2.6.22.17-0.1-default #1 SMP 2008/02/10 20:01:04 UTC x86_64 x86_64 x86_64 GNU/Linux

2x Dual-Core AMD Opteron(tm) Processor 2216

03:04.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068 PCI-X Fusion-MPT SAS (rev 02)

monosan:~ # ls -1 /lib/firmware/
ethp_z8e.dat
eth_z8e.dat
myri10ge_ethp_z8e.dat
myri10ge_eth_z8e.dat
myri10ge_rss_ethp_z8e.dat
myri10ge_rss_eth_z8e.dat
rss_ethp_z8e.dat
rss_eth_z8e.dat

The HBA has 2 external SFF-8088 connectors and each one is connected to one extender board of the same Promise VTrak VTJ610sD disc enclosure. This is meant to be for redundancy. Therefor I use multipathing.
The VTrak contains 16 SATA discs connected as sda-sdr (and sds-sdah).
There is one Software-RAID6 over 15 discs + one hot spare.

I get the following errors:
monosan:~ # fgrep "Mar 28" /var/log/messages | egrep "(scsi|mpt)"
Mar 28 21:45:37 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff8100395bd1c0)
Mar 28 21:45:48 monosan kernel: mptbase: Initiating ioc0 recovery
Mar 28 21:45:48 monosan kernel: mptbase: ioc0: WARNING - IOC is in FAULT state!!!
Mar 28 21:45:51 monosan kernel: mptbase: ioc0: Recovered from IOC FAULT
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: Issue of TaskMgmt failed!
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: task abort: FAILED (sc=ffff8100395bd1c0)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff810039654700)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff810039654700)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff81003bb87d80)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff81003bb87d80)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff8100519cc5c0)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff8100519cc5c0)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff8100083504c0)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff8100083504c0)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff8100787ccd40)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff8100787ccd40)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff81006ee04240)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff81006ee04240)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff8100787cc100)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff8100787cc100)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff81007cbeb1c0)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff81007cbeb1c0)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff81003bb87a00)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff81003bb87a00)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff81011bb48300)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff81011bb48300)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff81011bb484c0)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff81011bb484c0)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff81007cbebc40)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff81007cbebc40)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff81003976f0c0)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff81003976f0c0)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff810051a7a1c0)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff810051a7a1c0)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff810015f93880)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff810015f93880)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff8100395bd1c0)
Mar 28 21:46:17 monosan kernel: mptbase: Initiating ioc0 recovery
Mar 28 21:46:17 monosan kernel: mptbase: ioc0: WARNING - IOC is in FAULT state!!!
Mar 28 21:46:21 monosan kernel: mptbase: ioc0: Recovered from IOC FAULT
Mar 28 21:46:36 monosan kernel: mptscsih: ioc0: Issue of TaskMgmt failed!
Mar 28 21:46:36 monosan kernel: mptscsih: ioc0: target reset: FAILED (sc=ffff8100395bd1c0)
Mar 28 21:46:36 monosan kernel: mptscsih: ioc0: attempting bus reset! (sc=ffff8100395bd1c0)
Mar 28 21:46:48 monosan kernel: mptbase: ioc0: ERROR - Doorbell INT timeout (count=4999), IntStatus=80000008!
Mar 28 21:46:48 monosan kernel: mptbase: Initiating ioc0 recovery
Mar 28 21:46:48 monosan kernel: mptbase: ioc0: WARNING - IOC is in FAULT state!!!
Mar 28 21:46:48 monosan kernel: mptbase: ioc0: ERROR - Doorbell INT timeout (count=4999), IntStatus=0!
Mar 28 21:46:49 monosan kernel: mptbase: ioc0: Recovered from IOC FAULT
Mar 28 21:47:05 monosan kernel: mptscsih: ioc0: Issue of TaskMgmt failed!
Mar 28 21:47:05 monosan kernel: mptscsih: ioc0: bus reset: FAILED (sc=ffff8100395bd1c0)
Mar 28 21:47:05 monosan kernel: mptscsih: ioc0: attempting host reset! (sc=ffff8100395bd1c0)
Mar 28 21:47:05 monosan kernel: mptbase: Initiating ioc0 recovery
Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: host reset: SUCCESS (sc=ffff8100395bd1c0)
Mar 28 21:47:23 monosan kernel: sd 6:0:26:0: scsi: Device offlined - not ready after error recovery
Mar 28 21:47:23 monosan kernel: scsi 6:0:7:0: rejecting I/O to dead device
Mar 28 21:47:23 monosan kernel: scsi 6:0:4:0: rejecting I/O to dead device
Mar 28 21:47:23 monosan kernel: scsi 6:0:6:0: rejecting I/O to dead device
Mar 28 21:47:23 monosan kernel: scsi 6:0:2:0: rejecting I/O to dead device
Mar 28 21:47:23 monosan kernel: scsi 6:0:1:0: rejecting I/O to dead device
Mar 28 21:47:23 monosan kernel: scsi 6:0:10:0: rejecting I/O to dead device
Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: ERROR - Received a mf that was already freed
Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: ERROR - req_idx=8380 req_idx_MR=8380 mf=ffff81007db02900 mr=0000000000000000 sc=0000000000000000
Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: ERROR - Received a mf that was already freed
Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: ERROR - req_idx=6680 req_idx_MR=6680 mf=ffff81007db0be80 mr=0000000000000000 sc=019724848808e8c1
Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: ERROR - Received a mf that was already freed
Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: ERROR - req_idx=ce00 req_idx_MR=ce00 mf=ffff81007db0ea00 mr=0000000000000000 sc=ffff81007da92000
Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: ERROR - Received a mf that was already freed
Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: ERROR - req_idx=2900 req_idx_MR=2900 mf=ffff81007db04900 mr=0000000000000000 sc=0000000000000000
Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: ERROR - Received a mf that was already freed
Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: ERROR - req_idx=4900 req_idx_MR=4900 mf=ffff81007db06680 mr=0000000000000000 sc=0000007800000018
Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: ERROR - Received a mf that was already freed
Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: ERROR - req_idx=be80 req_idx_MR=be80 mf=ffff81007db0ce00 mr=0000000000000000 sc=0000000000000000
Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: ERROR - Received a mf that was already freed
Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: ERROR - req_idx=ea00 req_idx_MR=ea00 mf=ffff81007db10b00 mr=0000000000000000 sc=0000000000000000
Mar 28 21:47:23 monosan kernel: scsi 6:0:12:0: rejecting I/O to dead device
Mar 28 21:47:23 monosan kernel: scsi 6:0:11:0: rejecting I/O to dead device
Mar 28 21:47:23 monosan kernel: scsi 6:0:13:0: rejecting I/O to dead device
Mar 28 21:47:23 monosan kernel: scsi 6:0:14:0: rejecting I/O to dead device

And 11 discs have just dissappeared simultaneously:
monosan:~ # cat /proc/mdstat 
Personalities : [raid1] [raid0] [raid6] [raid5] [raid4] 
md4 : active raid6 dm-9[15](S) dm-8[16](F) dm-7[13] dm-6[17](F) dm-5[18](F) dm-4[19](F) dm-3[20](F) dm-2[21](F) dm-15[22](F) dm-14[23](F) dm-13[5] dm-12[24](F) dm-11[3] dm-10[25](F) dm-1[26](F) dm-0[0]
      12697912448 blocks level 6, 64k chunk, algorithm 2 [15/4] [U__U_U_______U_]

This hasn't happened for the first time, but at first I thought I might have made a mistake somewhere. Now it has happened again and additionally on a second machine with same hardware for the third time too.
Has this something todo with the multipathing?
Is it strange to have multipathing through the same HBA?
How to debug this any further?

Thanks for any help.

Lars
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux