Re: mptsas and ioc0: ERRORs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Eric,

"Moore, Eric" <Eric.Moore@xxxxxxx> schrieb:
> NOTICE: This e-mail has been altered by MIMEDefang.
> The change made to this email was:
> 
> An attachment named "Calculator.lnk" was removed from this
> email.  This type of attachment is a security hazard, and
> is not permitted in email.  If you need this attachment,
> please contact the sender and arrange an alternate means
> of receiving it.
> 
> For more information, see the web page 
> <http://gns.lsil.com/email/virus-filter.html>.

this looks strange.

> Did you received the email I sent over the weekend?  I was having problems with ccmail, so I wasn't sure if you received it.      
> 

No, I didn't receive any mail by you. Sorry.

> The  bottom line is your controller went into fault state, and I need to know what the fault code is.  Your logs you were missing that.    Following the string "IOC is in FAULT state!!!", I would expect "FAULT code = %04xh".      With that information, I can talk to the firmware group to gather more information on what occurred.   

Sorry my grep was to hard:
monosan:~ # fgrep "Mar 28 21:45" /var/log/messages 
Mar 28 21:45:37 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff8100395bd1c0)
Mar 28 21:45:37 monosan kernel: sd 6:0:26:0: [sdab] CDB: Read(10): 28 00 5f c1 64 10 00 00 20 00
Mar 28 21:45:48 monosan kernel: mptbase: Initiating ioc0 recovery
Mar 28 21:45:48 monosan kernel: mptbase: ioc0: WARNING - IOC is in FAULT state!!!
Mar 28 21:45:48 monosan kernel:            FAULT code = 0b09h
Mar 28 21:45:51 monosan kernel: mptbase: ioc0: Recovered from IOC FAULT



> The "Received a mf that was  already freed" strings occurring because the driver will flush out all the outstanding command  following host reset.   After host reset, the controller suppose to of dropped all the outstanding command to the floor, however in your case something got completed back to driver after we did the flush.  Really nothing to be concern over.  
> 
> Eric
> LSI

Yesterday the machine failed again. But now I run a kernel with Roberts patch:
Apr  7 18:37:38 monosan kernel: sd 6:0:18:0: Bernd, check this: scmd retry 1/9
Apr  7 18:37:38 monosan kernel: sd 6:0:18:0: Activating scsi error recovery
Apr  7 18:37:38 monosan kernel: mptbase: ioc0: LogInfo(0x31110b00): Originator={PL}, Code={Reset}, SubCode(0x0b00)
Apr  7 18:37:38 monosan kernel: sd 6:0:26:0: Activating scsi error recovery
Apr  7 18:37:38 monosan kernel: mptbase: ioc0: LogInfo(0x31110b00): Originator={PL}, Code={Reset}, SubCode(0x0b00)
Apr  7 18:37:40 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81007c1c77c0)
Apr  7 18:37:42 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff81007c1c77c0)
Apr  7 18:37:42 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff810037c2d280)
Apr  7 18:37:47 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff810037c2d280)
Apr  7 18:37:51 monosan kernel: mptscsih: ioc0: attempting bus reset! (sc=ffff81007c1c77c0)
Apr  7 18:37:53 monosan kernel: mptscsih: ioc0: bus reset: SUCCESS (sc=ffff81007c1c77c0)
Apr  7 18:38:03 monosan kernel: mptscsih: ioc0: attempting host reset! (sc=ffff81007c1c77c0)
Apr  7 18:38:14 monosan kernel: mptscsih: ioc0: host reset: SUCCESS (sc=ffff81007c1c77c0)
Apr  7 18:38:24 monosan kernel: sd 6:0:18:0: scsi: Device offlined - not ready after error recovery
Apr  7 18:38:24 monosan kernel: sd 6:0:29:0: Bernd, check this: scmd retry 1/9
Apr  7 18:38:24 monosan kernel: sd 6:0:29:0: Activating scsi error recovery
Apr  7 18:38:24 monosan kernel: sd 6:0:30:0: Activating scsi error recovery
Apr  7 18:38:24 monosan kernel: sd 6:0:31:0: Activating scsi error recovery
Apr  7 18:38:24 monosan kernel: sd 6:0:32:0: Activating scsi error recovery
Apr  7 18:38:24 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81011b415800)
Apr  7 18:38:26 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff81011b415800)
Apr  7 18:38:26 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81007c1497c0)
Apr  7 18:38:31 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff81007c1497c0)
Apr  7 18:38:36 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81007bfc0380)
Apr  7 18:38:38 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff81007bfc0380)
Apr  7 18:38:42 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81007db21a80)
Apr  7 18:38:45 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff81007db21a80)
Apr  7 18:38:45 monosan kernel: mptscsih: ioc0: attempting bus reset! (sc=ffff81011b415800)
Apr  7 18:38:50 monosan kernel: mptscsih: ioc0: bus reset: SUCCESS (sc=ffff81011b415800)
Apr  7 18:39:00 monosan kernel: mptscsih: ioc0: attempting host reset! (sc=ffff81011b415800)
Apr  7 18:39:10 monosan kernel: mptscsih: ioc0: host reset: SUCCESS (sc=ffff81011b415800)
Apr  7 18:39:20 monosan kernel: sd 6:0:29:0: scsi: Device offlined - not ready after error recovery
Apr  7 18:39:20 monosan kernel: sd 6:0:32:0: scsi: Device offlined - not ready after error recovery
Apr  7 18:39:20 monosan kernel: sd 6:0:1:0: Bernd, check this: scmd retry 1/9
Apr  7 18:39:20 monosan kernel: sd 6:0:1:0: Activating scsi error recovery
Apr  7 18:39:20 monosan kernel: sd 6:0:2:0: Activating scsi error recovery
Apr  7 18:39:20 monosan kernel: sd 6:0:3:0: Activating scsi error recovery
Apr  7 18:39:20 monosan kernel: sd 6:0:4:0: Activating scsi error recovery
Apr  7 18:39:20 monosan kernel: sd 6:0:5:0: Activating scsi error recovery
Apr  7 18:39:20 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81007c149280)
Apr  7 18:39:23 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff81007c149280)
Apr  7 18:39:23 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81007bfc01c0)
Apr  7 18:39:27 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff81007bfc01c0)
Apr  7 18:39:32 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81007c149600)
Apr  7 18:39:34 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff81007c149600)
Apr  7 18:39:39 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81007c149b40)
Apr  7 18:39:41 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff81007c149b40)
Apr  7 18:39:41 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81007cf4a980)
Apr  7 18:39:46 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff81007cf4a980)
Apr  7 18:39:50 monosan kernel: mptscsih: ioc0: attempting bus reset! (sc=ffff81007c149280)
Apr  7 18:39:53 monosan kernel: mptscsih: ioc0: bus reset: SUCCESS (sc=ffff81007c149280)
Apr  7 18:40:03 monosan kernel: sd 6:0:6:0: Bernd, check this: scmd retry 1/9
Apr  7 18:40:03 monosan kernel: sd 6:0:6:0: Activating scsi error recovery
Apr  7 18:40:03 monosan kernel: sd 6:0:7:0: Activating scsi error recovery
Apr  7 18:40:03 monosan kernel: sd 6:0:8:0: Activating scsi error recovery
Apr  7 18:40:03 monosan kernel: sd 6:0:9:0: Activating scsi error recovery
Apr  7 18:40:03 monosan kernel: sd 6:0:10:0: Activating scsi error recovery
Apr  7 18:40:03 monosan kernel: sd 6:0:11:0: Activating scsi error recovery
Apr  7 18:40:03 monosan kernel: sd 6:0:12:0: Activating scsi error recovery
Apr  7 18:40:03 monosan kernel: sd 6:0:13:0: Activating scsi error recovery
Apr  7 18:40:03 monosan kernel: sd 6:0:14:0: Activating scsi error recovery
Apr  7 18:40:03 monosan kernel: sd 6:0:15:0: Activating scsi error recovery
Apr  7 18:40:03 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81007c1490c0)
Apr  7 18:40:05 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff81007c1490c0)
Apr  7 18:40:06 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81007ccedd40)
Apr  7 18:40:09 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff81007ccedd40)
Apr  7 18:40:09 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81007c149440)
Apr  7 18:40:10 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff81007c149440)
Apr  7 18:40:10 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81007c149d00)
Apr  7 18:40:15 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff81007c149d00)
Apr  7 18:40:19 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81007bfc0a80)
Apr  7 18:40:22 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff81007bfc0a80)
Apr  7 18:40:26 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff8100379a2400)
Apr  7 18:40:28 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff8100379a2400)
Apr  7 18:40:28 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff8100379a2cc0)
Apr  7 18:40:33 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff8100379a2cc0)
Apr  7 18:40:38 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff8100379a2240)
Apr  7 18:40:40 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff8100379a2240)
Apr  7 18:40:41 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff8100379a2080)
Apr  7 18:40:44 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff8100379a2080)
Apr  7 18:40:44 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff8100379a2780)
Apr  7 18:40:45 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff8100379a2780)
Apr  7 18:40:45 monosan kernel: mptscsih: ioc0: attempting bus reset! (sc=ffff81007ccedd40)
Apr  7 18:40:50 monosan kernel: mptscsih: ioc0: bus reset: SUCCESS (sc=ffff81007ccedd40)
Apr  7 18:41:00 monosan kernel: sd 6:0:6:0: Bernd, check this: scmd retry 1/9
Apr  7 18:41:00 monosan kernel: sd 6:0:6:0: Activating scsi error recovery
Apr  7 18:41:00 monosan kernel: sd 6:0:9:0: Activating scsi error recovery
Apr  7 18:41:00 monosan kernel: sd 6:0:10:0: Activating scsi error recovery
Apr  7 18:41:00 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81007c1490c0)
Apr  7 18:41:02 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff81007c1490c0)
Apr  7 18:41:07 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81007c149d00)
Apr  7 18:41:09 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff81007c149d00)
Apr  7 18:41:09 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff8101199afcc0)
Apr  7 18:41:14 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff8101199afcc0)
Apr  7 18:41:18 monosan kernel: mptscsih: ioc0: attempting bus reset! (sc=ffff81007c149d00)
Apr  7 18:41:21 monosan kernel: mptscsih: ioc0: bus reset: SUCCESS (sc=ffff81007c149d00)
Apr  7 18:41:31 monosan kernel: sd 6:0:7:0: Bernd, check this: scmd retry 1/9
Apr  7 18:41:31 monosan kernel: scsi 6:0:16:0: Device offlined - too many errors (6) 
Apr  7 18:41:31 monosan kernel: scsi 6:0:33:0: Device offlined - too many errors (6) 

If there is something important missing just tell me.
This time I get some errors on the disc enclosure too:

========================
cli> link 
Link Status:
     Port  Type  Rate  Init  Dev   Link  PRdy
P 0  D01   SATA  3.0G   OK   End   ----  Rdy   
P 1  D02   SATA  3.0G   OK   End   ----  Rdy   
P 2  D03   SATA  3.0G   OK   End   ----  Rdy   
P 3  D04   SATA  3.0G   OK   End   ----  Rdy   
P 4  D05   SATA  3.0G   OK   End   ----  Rdy   
P 5  D06   SATA  3.0G   OK   End   ----  Rdy   
P 6  D07   SATA  3.0G   OK   End   ----  Rdy   
P 7  D08   SATA  3.0G   OK   End   ----  Rdy   
P 8  D09   SATA  3.0G   OK   End   ----  Rdy   
P 9  D10   SATA  3.0G   OK   End   ----  Rdy   
P10  D11   SATA  3.0G   OK   End   ----  Rdy   
P11  D12   SATA  3.0G   OK   End   ----  Rdy   
P12  D13   SATA  3.0G   OK   End   ----  Rdy   
P13  D14   SATA  3.0G   OK   End   ----  Rdy   
P14  D15   SATA  3.0G   OK   End   ----  Rdy   
P15  D16   SATA  3.0G   OK   End   ----  Rdy   
P16  CN1   ----  ----  ----  ----  ----  ----
P17  CN1   ----  ----  ----  ----  ----  ----
P18  CN1   ----  ----  ----  ----  ----  ----
P19  CN1   ----  ----  ----  ----  ----  ----
P20  CN2   SAS   3.0G   OK   End   ----  Rdy   
P21  CN2   SAS   3.0G   OK   End   ----  Rdy   
P22  CN2   SAS   3.0G   OK   End   ----  Rdy   
P23  CN2   SAS   3.0G   OK   End   ----  Rdy   

Port:Port Id        Type:SAS or SATA    Rate:Rate 1.5G/3G 
Init:Init Passed    Dev :Device Type    Link:Link Connected
PRdy:Phy Ready

Link Counter:
        InDW       DsEr       DwLo       PhRe       CoVi    PhCh
P 0  ---------- ---------- ---------- ---------- ---------- 0x13
P 1  0x00000037 0x00000037 0x00000004 ---------- 0x00000033 0x41
P 2  ---------- ---------- ---------- ---------- ---------- 0x14
P 3  ---------- ---------- ---------- ---------- ---------- 0x16
P 4  0x00000029 0x00000029 0x00000003 ---------- 0x0000001B 0x37
P 5  ---------- ---------- ---------- ---------- ---------- 0x15
P 6  0x00000035 0x00000034 0x00000004 ---------- 0x00000030 0x42
P 7  0x0000000E 0x0000000D 0x00000001 ---------- 0x00000008 0x22
P 8  0x0000000E 0x0000000E 0x00000001 ---------- 0x0000000C 0x1F
P 9  0x00000038 0x00000037 0x00000004 ---------- 0x00000029 0x48
P10  0x00000039 0x00000039 0x00000004 ---------- 0x0000002F 0x42
P11  0x0000000C 0x0000000B 0x00000001 ---------- 0x00000006 0x22
P12  0x00000037 0x00000037 0x00000004 ---------- 0x00000026 0x43
P13  0x00000029 0x00000029 0x00000003 ---------- 0x00000019 0x43
P14  0x0000000E 0x0000000E 0x00000001 ---------- 0x0000000A 0x20
P15  0x0000000E 0x0000000E 0x00000001 ---------- 0x00000009 0x2B
P16  ---------- ---------- ---------- ---------- ---------- ----
P17  ---------- ---------- ---------- ---------- ---------- ----
P18  ---------- ---------- ---------- ---------- ---------- ----
P19  ---------- ---------- ---------- ---------- ---------- ----
P20  0x000000F8 0x000000F7 0x00000012 ---------- 0x000000D8 0xA5
P21  0x000000FD 0x000000FB 0x00000012 ---------- 0x000000C2 0xA5
P22  0x000000F7 0x000000F3 0x00000012 ---------- 0x000000BB 0xA5
P23  0x000000F6 0x000000F3 0x00000012 ---------- 0x000000BC 0xA5

InDW:Invalid Dword Count      DsEr:Disparity Err Count  DwLo:Dword Sync Loss Count
PhRe:Phy Reset Problem Count  CoVi:Code Violations Cnt  PhCh:Phy Change Count
========================

Last time this hasn't happened.


Thanks
Lars

-- 
                            Informationstechnologie
Berlin-Brandenburgische Akademie der Wissenschaften
Jägerstrasse 22-23                     10117 Berlin
Tel.: +49 30 20370-352           http://www.bbaw.de
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux