mtp fusion "MID not found" and "DMA Error" under high streaming load with 2.6.27 and 2.6.29

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We have an IBM x3550 server with 32G of RAM that has an LSI53C1030 card connected to two external SATA-to-SCSI units. This server has been running fine with modest load for several *years* with the exact same hardware and various 2.6.x kernel versions (regularly upgraded) with no problems. Generally IO on this machine is lots of small random IOs to many millions of files (an email server).

Yesterday we used the server to unpack a multi-gigabyte data file to a partition, causing a huge streaming IO run. This repeatedly caused the mtp fusion driver/scsi bus to get confused, causing various batches of errors such as:

[1853281.761689] mptbase: ioc0: LogInfo(0x11070000): F/W: DMA Error
[1853281.761719] mptbase: ioc0: LogInfo(0x11070000): F/W: DMA Error
[1853281.761748] mptbase: ioc0: LogInfo(0x11070000): F/W: DMA Error

[1861597.029169] mptscsih: ioc0: attempting task abort! (sc=ffff88031d71d200) [1861597.029203] sd 1:0:0:1: [sdc] CDB: cdb[0]=0x28: 28 00 0b 7a 3d 7d 00 00 08 00
[1861597.029272] mptscsih: ioc0: task abort: SUCCESS (sc=ffff88031d71d200)

[1862303.900733] lost page write due to I/O error on sdh2
[1862303.900774] sd 2:0:0:3: rejecting I/O to offline device
[1862303.900809] sd 2:0:0:3: [sdi] Unhandled error code
[1862303.900834] sd 2:0:0:3: [sdi] Result: hostbyte=0x01 driverbyte=0x00
[1862303.900863] end_request: I/O error, dev sdi, sector 1936592578
[1862303.900891] Buffer I/O error on device sdi4, logical block 22349051
[1862303.900919] lost page write due to I/O error on sdi4
[1862313.681008] mptbase: ioc0: ERROR - Wait IOC_READY state timeout(15)!

[1862330.893017] target1:0:0: Beginning Domain Validation
[1862330.899215] target1:0:0: Ending Domain Validation
[1862330.926581] target1:0:0: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 127)

[1862393.257341] mptbase: ioc0: LogInfo(0x11010001): F/W: bug! MID not found
[1862393.257373] mptbase: ioc0: LogInfo(0x11010001): F/W: bug! MID not found
[1862393.257406] mptbase: ioc0: LogInfo(0x11010001): F/W: bug! MID not found

Initially this machine had a 2.6.29.3-amd64 kernel (vanilla, mpt driver compiled in), but we rebooted into a 2.6.27.24-amd64 (vanilla, mpt driver compiled in) kernel as well, and were able to pretty much reproduce the same problem at will by doing the streaming read/write workload. Post reboot into 2.6.27.24 eample:

[ 416.849313] mptbase: ioc0: LogInfo(0x11070000): F/W: DMA Error
[ 416.849384] mptbase: ioc0: LogInfo(0x11070000): F/W: DMA Error
[ 416.849454] mptbase: ioc0: LogInfo(0x11070000): F/W: DMA Error

Hardware revision is:

$ dmesg | grep FwRev
[ 5.959949] scsi1 : ioc0: LSI53C1030 C0, FwRev=01032700h, Ports=1, MaxQ=255, IRQ=19 [ 11.478451] scsi2 : ioc1: LSI53C1030 C0, FwRev=01032700h, Ports=1, MaxQ=255, IRQ=16

Let me know what debugging information I can supply to help with this because we should be able to reproduce it again quite easily.

Rob

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux