Re: [PATCH 0/5] mpt fusion error handler patches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 12, 2008 at 08:57:40PM +0200, Bernd Schubert wrote:
> Hello,
> 
> I'm going to submit several error handler patches for the MPT fusion 
> driver. The purpose of these patches is mainly to fix errors happening 
> on the second port of dual port 53C1030 based HBAs.
> As I complained some time ago on this list, a device failure on one of the 
> ports of LSI22320R HBAs, will also cause device failures of innocent devices 
> on the other port of this HBA. In order to debug this Eric Moore sent me a 
> fusion-tip version of this driver, which we have been using ever since. However, 
> this version has issues with SAS HBAs and probably also won't work for recent kernel 
> versions. So I spent quite some amount of time to figure out why fusion-tip 
> version (4.x) of the driver doesn't have the issue.
> 
> Below I will provide the some examples of these issues. Errors on one of the attached 
> scsi devices have been simulated using lsiutil by doing target of one of the attached 

This was supposed to be "... by doing target resets of one ..."

> devices on one of the port (5 0 4 0).
> 
> Unpatched 2.6.26 + a few scsi diagnostics and error handler patches:
> 
> [  224.819697] sd 5:0:4:0: last recovery: 4294911483, now: 4294948403
> [  224.826142] sd 5:0:4:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_OK,SUGGEST_OK
> [  224.831676] sd 5:0:4:0: [sdc] CDB: Write(16): 8a 00 00 00 00 01 0c 27 2e 98 00 00 04 00 00 00
> [  224.842803] sd 5:0:4:0: Activating scsi error recovery (1)
> [  224.857824] sd 5:0:4:0: trying to abort command
> [  224.865697] mptscsih: ioc1: attempting task abort! (sc=ffff8100f8f10000)
> [  224.870572] sd 5:0:4:0: [sdc] CDB: Write(16): 8a 00 00 00 00 01 0c 27 2e 98 00 00 04 00 00 00
> [  227.047968] mptbase: ioc1: Initiating recovery
> [  229.481849] sd 5:0:4:0: mptscsih: ioc1: completing cmds: fw_channel 0, fw_id 4, sc=ffff8100f8fbb180, mf = ffff8100
> [...]
> [  364.322013] mptscsih: ioc1: bus reset: SUCCESS (sc=ffff8100f8f11b80)
> [  371.924342] sd 4:0:2:0: scmd retry 6/6
> [  371.928364] sd 4:0:2:0: last recovery: 0, now: 4294985148
> [  371.932924] sd 4:0:2:0: [sda] Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
> [  371.932924] sd 4:0:2:0: [sda] CDB: Write(16): 8a 00 00 00 00 01 31 8b 4a 4e 00 00 00 39 00 00
> [  371.932924] sd 4:0:2:0: Activating scsi error recovery (1)
> [  371.960382] sd 4:0:2:0: Sending BDR 0xffff81007eaf2538
> [  371.984936] sd 4:0:2:0: trying device reset
> [  371.989426] mptscsih: ioc0: attempting target reset! (sc=ffff81007eb7c780)
> 
> As you can see, suddenly also target 4 0 2 0 fails, which is ioc0. In the end:
> 
> [  398.596119] sd 4:0:2:0: [sda] Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
> [  398.605291] end_request: I/O error, dev sda, sector 5126179406
> [  398.612360] end_request: I/O error, dev sda, sector 5126179406
> [  398.617818]  target4:0:2: Beginning Domain Validation
> 
> So the innocent device sda (which is really another device) failed.
> 
> Now the same with patches applied, but with the soft reset-handler deactivated:
> 
> [  912.861708] sd 5:0:4:0: last recovery: 4295082734, now: 4295120387
> [  912.868130] sd 5:0:4:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_OK,SUGGEST_
> 
> [  912.873757] sd 5:0:4:0: [sdc] CDB: Write(10): 2a 00 73 11 33 08 00 04 00 00
> [  912.873757] sd 5:0:4:0: Activating scsi error recovery (2)
> [  912.889492] sd 5:0:4:0: trying to abort command
> [  912.894118] mptscsih: ioc1: attempting task abort! (sc=ffff8100e361d180)
> [  912.900951] sd 5:0:4:0: [sdc] CDB: Write(10): 2a 00 73 11 33 08 00 04 00 00
> [  913.025771] mptscsih: ioc1: task abort: FAILED (sc=ffff8100e361d180)
> [  913.032269] sd 5:0:4:0: Sending BDR 0xffff8100f99e1428
> [  913.040264] sd 5:0:4:0: trying device reset
> [  913.044597] mptscsih: ioc1: attempting target reset! (sc=ffff8100e361d180)
> [  913.049955] sd 5:0:4:0: [sdc] CDB: Write(10): 2a 00 73 11 33 08 00 04 00 00
> [  913.177284] mptscsih: ioc1: target reset: FAILED (sc=ffff8100e361d180)
> [  913.181946] Sending BRST chan: 0
> [  913.185945] sd 5:0:4:0: trying bus reset
> [  913.189974] mptscsih: ioc1: attempting bus reset! (sc=ffff8100e361d180)
> [  913.197310] sd 5:0:4:0: [sdc] CDB: Write(10): 2a 00 73 11 33 08 00 04 00 00
> [  913.325079] mptscsih: ioc1: bus reset: FAILED (sc=ffff8100e361d180)
> [  913.329668] sd 5:0:4:0: trying host reset
> [  913.333864] mptscsih: ioc1: attempting host reset! (sc=ffff8100e361d180)
> [  913.341832] mptscsih: ioc1: Skipping hard reset in order to prevent failures on ioc
> 
> [  913.349821] mptscsih: ioc1: host reset: FAILED (sc=ffff8100e361d180)
> [  913.356704] sd 5:0:4:0: Device offlined - not ready after error recovery
> [  913.363199] sd 5:0:4:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT,SUGGEST_OK
> 
> => The device was not recovered, but at least 4 0 2 0 didn't fail :)
> 
> Now with all patches applied:
> 
> [  214.903699] sd 5:0:4:0: last recovery: 0, now: 4294945953
> [  214.910652] sd 5:0:4:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_OK,SUGGEST_OK
> [  214.918652] sd 5:0:4:0: [sdc] CDB: Write(16): 8a 00 00 00 00 01 31 8b 9c e7 00 00 00 39 00 00
> [  214.918652] sd 5:0:4:0: Activating scsi error recovery (1)
> [  214.934655] sd 5:0:4:0: trying to abort command
> [  214.939581] mptscsih: ioc1: attempting task abort! (sc=ffff8100f9be0c80)
> [  214.947581] sd 5:0:4:0: [sdc] CDB: Write(16): 8a 00 00 00 00 01 31 8b 9c e7 00 00 00 39 00 00
> [  215.077430] mptscsih: ioc1: task abort: FAILED (sc=ffff8100f9be0c80)
> [  215.083645] sd 5:0:4:0: Sending BDR 0xffff81007eb51428
> [  215.090298] sd 5:0:4:0: trying device reset
> [  215.094810] mptscsih: ioc1: attempting target reset! (sc=ffff8100f9be0c80)
> [  215.101917] sd 5:0:4:0: [sdc] CDB: Write(16): 8a 00 00 00 00 01 31 8b 9c e7 00 00 00 39 00 00
> [  215.229659] mptscsih: ioc1: target reset: FAILED (sc=ffff8100f9be0c80)
> [  215.236367] Sending BRST chan: 0
> [  215.240173] sd 5:0:4:0: trying bus reset
> [  215.244313] mptscsih: ioc1: attempting bus reset! (sc=ffff8100f9be0c80)
> [  215.251731] sd 5:0:4:0: [sdc] CDB: Write(16): 8a 00 00 00 00 01 31 8b 9c e7 00 00 00 39 00 00
> [  215.382449] mptscsih: ioc1: bus reset: FAILED (sc=ffff8100f9be0c80)
> [  215.388946] sd 5:0:4:0: trying host reset
> [  215.393162] mptscsih: ioc1: attempting host reset! (sc=ffff8100f9be0c80)
> [  215.400489] sd 5:0:4:0: mptscsih: ioc1: completing cmds: fw_channel 0, fw_id 4, sc=ffff8100f9be0c80, mf = ffff8105
> [  217.317914] mptbase: ioc1: SoftResetHandler: completed (1 seconds): SUCCESS
> [  217.324924] mptscsih: ioc1: host reset: SUCCESS (sc=ffff8100f9be0c80)
> [  227.546452]  target5:0:4: Beginning Domain Validation
> [  227.578775]  target5:0:4: Ending Domain Validation
> [  227.584099]  target5:0:4: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS PCOMP (6.25 ns, offset 127)
> [  227.596959]  target5:0:5: Beginning Domain Validation
> [  227.651196]  target5:0:5: Ending Domain Validation
> [  227.656977]  target5:0:5: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS PCOMP (6.25 ns, offset 127)
> 
> 
> -- 
> Bernd Schubert
> Q-Leap Networks GmbH
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux