I will give it a go. I am use the latest stable so the patch was not clean. I will pull down the latest if need be... Eddie On Tue, May 18, 2010 at 3:05 PM, Giridhar Malavali <giridhar.malavali@xxxxxxxxxx> wrote: > > > On Tue, 18 May 2010, Eddie Williams wrote: > >> Here is a snipit of the log with the timestamp cut off to avoid making >> the mail so large. I can send you the full log directly if you like. >> >> driver load: >> QLogic Fibre Channel HBA Driver: 8.03.01-k10-debug >> qla2xxx 0000:06:01.0: PCI INT A -> GSI 74 (level, low) -> IRQ 74 >> qla2xxx 0000:06:01.0: Found an ISP2312, irq 74, iobase 0xffffc90000dba000 >> qla2xxx 0000:06:01.0: Configuring PCI space... >> qla2xxx 0000:06:01.0: Configure NVRAM parameters... >> qla2xxx 0000:06:01.0: Verifying loaded RISC code... >> scsi(4): **** Load RISC code **** >> qla2xxx 0000:06:01.0: firmware: requesting ql2300_fw.bin >> scsi(4): Verifying Checksum of loaded RISC code. >> scsi(4): Checksum OK, start firmware. >> qla2xxx 0000:06:01.0: Allocated (412 KB) for firmware dump... >> scsi(4): Issue init firmware. >> DEBUG: detect hba 4 at address = ffff88003bf90000 >> scsi4 : qla2xxx >> scsi(4): qla2x00_loop_resync() >> scsi(4): Asynchronous P2P MODE received. >> scsi(4): Asynchronous LOOP UP (2 Gbps). >> qla2xxx 0000:06:01.0: LOOP UP detected (2 Gbps). >> scsi(4): Asynchronous PORT UPDATE. >> scsi(4): Port database changed ffff 0006 0000. >> scsi(4): F/W Ready - OK >> scsi(4): fw_state=3 (3bb1, 8800, ffff, e540) curr time=10031b431. >> scsi(4): Configure loop -- dpc flags =0x112e1 >> scsi(4): RSCN queue entry[0] = [00/000000]. >> scsi(4): device_resync: rscn overflow. >> qla2xxx 0000:06:01.0: >> QLogic Fibre Channel HBA Driver: 8.03.01-k10-debug >> QLogic QLA2340 - >> ISP2312: PCI-X (100 MHz) @ 0000:06:01.0 hdma-, host#=4, fw=3.03.27 IPX >> qla2xxx 0000:06:02.0: PCI INT A -> GSI 78 (level, low) -> IRQ 78 >> qla2xxx 0000:06:02.0: Found an ISP2312, irq 78, iobase 0xffffc900010e4000 >> qla2xxx 0000:06:02.0: Configuring PCI space... >> qla2xxx 0000:06:02.0: Configure NVRAM parameters... >> qla2xxx 0000:06:02.0: Verifying loaded RISC code... >> scsi(5): **** Load RISC code **** >> scsi(5): Verifying Checksum of loaded RISC code. >> scsi(5): Checksum OK, start firmware. >> qla2xxx 0000:06:02.0: Allocated (412 KB) for firmware dump... >> scsi(5): Issue init firmware. >> DEBUG: detect hba 5 at address = ffff88003b000000 >> scsi5 : qla2xxx >> scsi(5): qla2x00_loop_resync() >> qla2xxx 0000:06:02.0: >> QLogic Fibre Channel HBA Driver: 8.03.01-k10-debug >> QLogic QLA2340 - >> ISP2312: PCI-X (100 MHz) @ 0000:06:02.0 hdma-, host#=5, fw=3.03.27 IPX >> scsi(4): RHBA failed, completion status (6). >> scsi(4): RFT_ID exiting normally. >> scsi(4): RFF_ID failed, rejected request: >> 0 1 2 3 4 5 6 7 8 9 Ah Bh Ch Dh Eh Fh >> -------------------------------------------------------------- >> 01 00 00 00 fc 02 00 00 80 01 00 00 00 0b 00 00 >> scsi(4): Register FC-4 Features failed. >> scsi(4): RNN_ID exiting normally. >> scsi(4): RSNN_NN exiting normally. >> scsi(4): GID_PT entry - nn 200000e08b07ae8f pn 210000e08b07ae8f >> portid=610913. >> scsi(4): GID_PT entry - nn 200000e08b17a9df pn 210000e08b17a9df >> portid=610813. >> scsi(4): GID_PT entry - nn 20000000c9294b5b pn 10000000c9294b5b >> portid=610413. >> scsi(4): GID_PT entry - nn 50001fe150004d10 pn 50001fe150004d19 >> portid=610613. >> scsi(4): GID_PT entry - nn 50001fe150004d10 pn 50001fe150004d1d >> portid=610513. >> scsi(4): GID_PT entry - nn 500508b300902930 pn 500508b300902939 >> portid=610713. >> scsi(4): device wrap (610713) >> scsi(4): LOOP READY >> scsi(4): qla2x00_loop_resync - end >> scsi(4:2): Async-login - loop-id=81 portid=610813 retries=0. >> scsi(4:3): Async-login - loop-id=82 portid=610413 retries=0. >> scsi(4:2): Async-login failed - status=30 mb0=4007 mb1=0 mb2=0 mb6=0 >> mb7=0. >> scsi(4:3): Async-login failed - status=30 mb0=4007 mb1=1 mb2=0 mb6=0 >> mb7=0. >> scsi(4:4): Async-login - loop-id=83 portid=610613 retries=0. >> scsi(4:5): Async-login - loop-id=84 portid=610513 retries=0. >> scsi(4:6): Async-login - loop-id=85 portid=610713 retries=0. >> scsi(4): qla2x00_reset_marker() >> scsi(4:4): Async-login complete - mbx1=4. >> scsi(4:5): Async-login complete - mbx1=4. >> scsi(4:6): Async-login complete - mbx1=4. >> scsi(4:8): Async-login - loop-id=0 portid=610813 retries=0. >> scsi(4:9): Async-login - loop-id=1 portid=610413 retries=0. >> scsi(4:8): Async-login failed - status=30 mb0=4000 mb1=3 mb2=0 mb6=0 >> mb7=0. >> scsi 4:0:0:0: RAID COMPAQ HSV111 (C)COMPAQ 4004 PQ: 0 ANSI: >> 5 >> >> >> >> reset: >> >> >> qla2xxx 0000:06:01.0: scsi(4:2:13): BUS RESET ISSUED. >> scsi(4): Asynchronous P2P MODE received. >> scsi(4): Asynchronous PORT UPDATE. >> scsi(4): Port database changed ffff 0006 0000. >> scsi(4): Asynchronous PORT UPDATE ignored 0001/0006/0000. >> scsi(4): Asynchronous PORT UPDATE ignored 0001/0007/0000. >> scsi(4): fcport-0 - port retry count: 15 remaining >> scsi(4): fcport-1 - port retry count: 15 remaining >> scsi(4): fcport-2 - port retry count: 15 remaining >> scsi(4): qla2x00_reset_marker() >> scsi(4): qla2x00_loop_resync() >> scsi(4): F/W Ready - OK >> scsi(4): fw_state=3 (3bb1, 8800, ffff, e540) curr time=10032f088. >> scsi(4): Configure loop -- dpc flags =0x1260 >> scsi(4): RSCN queue entry[0] = [00/000000]. >> scsi(4): device_resync: rscn overflow. >> scsi(4): fcport-0 - port retry count: 14 remaining >> scsi(4): fcport-1 - port retry count: 14 remaining >> scsi(4): fcport-2 - port retry count: 14 remaining >> scsi(4): fcport-0 - port retry count: 13 remaining >> scsi(4): fcport-1 - port retry count: 13 remaining >> scsi(4): fcport-2 - port retry count: 13 remaining >> scsi(4): fcport-0 - port retry count: 12 remaining >> scsi(4): fcport-1 - port retry count: 12 remaining >> scsi(4): fcport-2 - port retry count: 12 remaining >> scsi(4): fcport-0 - port retry count: 11 remaining >> scsi(4): fcport-1 - port retry count: 11 remaining >> scsi(4): fcport-2 - port retry count: 11 remaining >> scsi(4): fcport-0 - port retry count: 10 remaining >> scsi(4): fcport-1 - port retry count: 10 remaining >> scsi(4): fcport-2 - port retry count: 10 remaining >> scsi(4): fcport-0 - port retry count: 9 remaining >> scsi(4): fcport-1 - port retry count: 9 remaining >> scsi(4): fcport-2 - port retry count: 9 remaining >> scsi(4): fcport-0 - port retry count: 8 remaining >> scsi(4): fcport-1 - port retry count: 8 remaining >> scsi(4): fcport-2 - port retry count: 8 remaining >> scsi(4): fcport-0 - port retry count: 7 remaining >> scsi(4): fcport-1 - port retry count: 7 remaining >> scsi(4): fcport-2 - port retry count: 7 remaining >> scsi(4): fcport-0 - port retry count: 6 remaining >> scsi(4): fcport-1 - port retry count: 6 remaining >> scsi(4): fcport-2 - port retry count: 6 remaining >> scsi(4): fcport-0 - port retry count: 5 remaining >> scsi(4): fcport-1 - port retry count: 5 remaining >> scsi(4): fcport-2 - port retry count: 5 remaining >> scsi(4): fcport-0 - port retry count: 4 remaining >> scsi(4): fcport-1 - port retry count: 4 remaining >> scsi(4): fcport-2 - port retry count: 4 remaining >> scsi(4): fcport-0 - port retry count: 3 remaining >> scsi(4): fcport-1 - port retry count: 3 remaining >> scsi(4): fcport-2 - port retry count: 3 remaining >> scsi(4): fcport-0 - port retry count: 2 remaining >> scsi(4): fcport-1 - port retry count: 2 remaining >> scsi(4): fcport-2 - port retry count: 2 remaining >> scsi(4): fcport-0 - port retry count: 1 remaining >> scsi(4): fcport-1 - port retry count: 1 remaining >> scsi(4): fcport-2 - port retry count: 1 remaining >> scsi(4): fcport-0 - port retry count: 0 remaining >> scsi(4): fcport-1 - port retry count: 0 remaining >> scsi(4): fcport-2 - port retry count: 0 remaining >> scsi(4): RHBA failed, completion status (6). >> qla2x00_mailbox_command(4): **** FAILED. mbx0=4005, mbx1=82, mbx2=0, >> cmd=71 **** >> qla2x00_fabric_logout(4): failed=102 mbx1=82. >> scsi(4): RFT_ID exiting normally. >> qla2x00_mailbox_command(4): **** FAILED. mbx0=4005, mbx1=83, >> mbx2=ffff, cmd=71 **** >> qla2x00_fabric_logout(4): failed=102 mbx1=83. >> rport-4:0-2: blocked FC remote port time out: removing target and >> saving binding >> scsi(4): RFF_ID failed, rejected request: >> 0 1 2 3 4 5 6 7 8 9 Ah Bh Ch Dh Eh Fh >> -------------------------------------------------------------- >> 01 00 00 00 fc 02 00 00 80 01 00 00 00 0b 00 00 >> scsi(4): Register FC-4 Features failed. >> qla2x00_mailbox_command(4): **** FAILED. mbx0=4005, mbx1=84, mbx2=0, >> cmd=71 **** >> qla2x00_fabric_logout(4): failed=102 mbx1=84. >> scsi(4): RNN_ID exiting normally. >> scsi(4): RSNN_NN exiting normally. >> sd 4:0:2:1: [sdk] Synchronizing SCSI cache >> hd: sdk: remove path (uevent) >> scsi(4): GID_PT entry - nn 200000e08b07ae8f pn 210000e08b07ae8f >> portid=610913. >> scsi(4): GID_PT entry - nn 200000e08b17a9df pn 210000e08b17a9df >> portid=610813. >> scsi(4): GID_PT entry - nn 20000000c9294b5b pn 10000000c9294b5b >> portid=610413. >> scsi(4): GID_PT entry - nn 50001fe150004d10 pn 50001fe150004d19 >> portid=610613. >> scsi(4): GID_PT entry - nn 50001fe150004d10 pn 50001fe150004d1d >> portid=610513. >> scsi(4): GID_PT entry - nn 500508b300902930 pn 500508b300902939 >> portid=610713. >> qla2x00_mailbox_command(4): **** FAILED. mbx0=4005, mbx1=82, >> mbx2=8800, cmd=71 **** >> qla2x00_fabric_logout(4): failed=102 mbx1=82. >> qla2x00_mailbox_command(4): **** FAILED. mbx0=4005, mbx1=83, >> mbx2=8800, cmd=71 **** >> qla2x00_fabric_logout(4): failed=102 mbx1=83. >> qla2x00_mailbox_command(4): **** FAILED. mbx0=4005, mbx1=84, >> mbx2=8800, cmd=71 **** >> qla2x00_fabric_logout(4): failed=102 mbx1=84. >> scsi(4): device wrap (610713) >> scsi(4): LOOP READY >> scsi(4): qla2x00_loop_resync - end >> qla2x00_mailbox_command(4): **** FAILED. mbx0=4006, mbx1=82, mbx2=5, >> cmd=17 **** >> qla2x00_abort_target(4): failed=102. >> qla2x00_loop_reset(4): bus_reset failed: target_reset=258 d_id=610613. >> qla2x00_mailbox_command(4): **** FAILED. mbx0=4006, mbx1=83, mbx2=5, >> cmd=17 **** >> qla2x00_abort_target(4): failed=102. >> qla2x00_loop_reset(4): bus_reset failed: target_reset=258 d_id=610513. >> qla2x00_mailbox_command(4): **** FAILED. mbx0=4006, mbx1=84, mbx2=5, >> cmd=17 **** >> qla2x00_abort_target(4): failed=102. >> qla2x00_loop_reset(4): bus_reset failed: target_reset=258 d_id=610713. >> qla2xxx 0000:06:01.0: qla2xxx_eh_bus_reset: reset succeded >> scsi(4:2a9): Async-login - loop-id=81 portid=610813 retries=0. >> scsi(4:2aa): Async-login - loop-id=1 portid=610413 retries=0. >> scsi(4:2ab): Async-login - loop-id=82 portid=610613 retries=0. >> scsi(4:2ac): Async-login - loop-id=83 portid=610513 retries=0. >> scsi(4:2ad): Async-login - loop-id=84 portid=610713 retries=0. >> scsi(4:2a9): Async-login failed - status=30 mb0=4000 mb1=3 mb2=0 mb6=0 >> mb7=0. >> scsi(4:2aa): Async-login complete - mbx1=3. >> scsi(4): Port login retry: 210000e08b17a9df, id = 0x0081 retry cnt=16 >> scsi(4:2ab): Async-login complete - mbx1=4. >> scsi(4:2ac): Async-login complete - mbx1=4. >> scsi(4:2ad): Async-login complete - mbx1=4. >> sd 4:0:2:1: [sdk] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK >> sd 4:0:2:7: [sdq] Synchronizing SCSI cache > > Recently, we found an issue with async login stuff with 23xx adapters and > decided to disbale this feature for timebeing on 23xx to debug further. Can > u please try following patches and let me know whether this fix the problem. > I am in process of queuing these patches to linux-scsi shortly. > > -- Giri > > > From c40c40cab7ffa8c0710a7fba9fba572df9f38e83 Mon Sep 17 00:00:00 2001 > From: Giridhar Malavali <giridhar.malavali@xxxxxxxxxx> > Date: Sun, 9 May 2010 19:55:41 -0700 > Subject: [PATCH 7/9] qla2xxx: Limit rport-flaps during link-disruptions. > > Signed-off-by: Andrew Vasquez <andrew.vasquez@xxxxxxxxxx> > --- > drivers/scsi/qla2xxx/qla_init.c | 8 +++++--- > 1 files changed, 5 insertions(+), 3 deletions(-) > > diff --git a/drivers/scsi/qla2xxx/qla_init.c > b/drivers/scsi/qla2xxx/qla_init.c > index f546839..b33e6c0 100644 > --- a/drivers/scsi/qla2xxx/qla_init.c > +++ b/drivers/scsi/qla2xxx/qla_init.c > @@ -2651,7 +2651,8 @@ qla2x00_configure_loop(scsi_qla_host_t *vha) > set_bit(LOCAL_LOOP_UPDATE, &vha->dpc_flags); > if (test_bit(RSCN_UPDATE, &save_flags)) { > set_bit(RSCN_UPDATE, &vha->dpc_flags); > - vha->flags.rscn_queue_overflow = 1; > + if (!IS_ALOGIO_CAPABLE(ha)) > + vha->flags.rscn_queue_overflow = 1; > } > } > > @@ -3209,8 +3210,9 @@ qla2x00_find_all_fabric_devs(scsi_qla_host_t *vha, > if (qla2x00_is_reserved_id(vha, loop_id)) > continue; > > - if (atomic_read(&vha->loop_down_timer) || > - LOOP_TRANSITION(vha)) { > + if (ha->current_topology == ISP_CFG_FL && > + (atomic_read(&vha->loop_down_timer) || > + LOOP_TRANSITION(vha))) { > atomic_set(&vha->loop_down_timer, 0); > set_bit(LOOP_RESYNC_NEEDED, &vha->dpc_flags); > set_bit(LOCAL_LOOP_UPDATE, &vha->dpc_flags); > -- > 1.6.0.2 > > > From 4e449325ee1268c1e62e10287e810111f0edb57d Mon Sep 17 00:00:00 2001 > From: Andrew Vasquez <andrew.vasquez@xxxxxxxxxx> > Date: Fri, 14 May 2010 18:14:12 -0700 > Subject: [PATCH 8/9] qla2xxx: Disable async-logins on ISP23xx. > > Temporary workaround while additional investigation done by > the firmware group. > > Can be re-enabled with the ql2xasyncenable module parameter set > to 1. > --- > drivers/scsi/qla2xxx/qla_def.h | 3 ++- > drivers/scsi/qla2xxx/qla_gbl.h | 1 + > drivers/scsi/qla2xxx/qla_os.c | 6 ++++++ > 3 files changed, 9 insertions(+), 1 deletions(-) > > diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h > index f8239bf..3ff3f5f 100644 > --- a/drivers/scsi/qla2xxx/qla_def.h > +++ b/drivers/scsi/qla2xxx/qla_def.h > @@ -2542,7 +2542,8 @@ struct qla_hw_data { > (ha)->flags.msix_enabled) > #define IS_FAC_REQUIRED(ha) (IS_QLA81XX(ha)) > #define IS_NOCACHE_VPD_TYPE(ha) (IS_QLA81XX(ha)) > -#define IS_ALOGIO_CAPABLE(ha) (IS_QLA23XX(ha) || IS_FWI2_CAPABLE(ha)) > +#define IS_ALOGIO_CAPABLE(ha) ((IS_QLA23XX(ha) && ql2xasyncenable) || \ > + IS_FWI2_CAPABLE(ha)) > > #define IS_IIDMA_CAPABLE(ha) ((ha)->device_type & DT_IIDMA) > #define IS_FWI2_CAPABLE(ha) ((ha)->device_type & DT_FWI2) > diff --git a/drivers/scsi/qla2xxx/qla_gbl.h b/drivers/scsi/qla2xxx/qla_gbl.h > index 2d3c540..2161e09 100644 > --- a/drivers/scsi/qla2xxx/qla_gbl.h > +++ b/drivers/scsi/qla2xxx/qla_gbl.h > @@ -98,6 +98,7 @@ extern int ql2xenabledif; > extern int ql2xenablehba_err_chk; > extern int ql2xtargetreset; > extern int ql2xfwloadalways; > +extern int ql2xasyncenable; > > extern int qla2x00_loop_reset(scsi_qla_host_t *); > extern void qla2x00_abort_all_cmds(scsi_qla_host_t *, int); > diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c > index 7400456..cb76059 100644 > --- a/drivers/scsi/qla2xxx/qla_os.c > +++ b/drivers/scsi/qla2xxx/qla_os.c > @@ -31,6 +31,12 @@ static int apidev_major; > */ > static struct kmem_cache *srb_cachep; > > +int ql2xasyncenable; > +module_param(ql2xasyncenable, int, S_IRUGO|S_IRUSR); > +MODULE_PARM_DESC(ql2xasyncenable, > + "Enables usage of Async-login/adisc on ISP23xx." > + "Default is 0 - use serialized mailbox commands."); > + > /* > * CT6 CTX allocation cache > */ > -- > 1.6.0.2 > > > > >> >> On Tue, May 18, 2010 at 12:56 PM, Giridhar Malavali >> <giridhar.malavali@xxxxxxxxxx> wrote: >>> >>> >>> On Tue, 18 May 2010, Eddie Williams wrote: >>> >>>> I upgraded an older EM64T server with 2 Qlogic controllers (QLA2340 >>>> FW:v3.03.27 DVR:v8.03.01-k10) to SLES 11 and when a SCSI Bus Reset >>>> (Loop Reset) is issued I get this time out where the driver ends up >>>> removing all devices (causing IO failures for mounted file systems and >>>> the like) and then rescaning to add the devices back. I verified that >>>> with the latest stable kernel (2.6.33.4) I see similar behavior >>>> (though the new kernel recovery much quicker). If I go back to an >>>> older boot disk with Red Hat EL 4 that kernel/driver will issue a >>>> reset without any issues, aka no port time out or loss of connection >>>> to the storage. >>>> >>>> I have tried similar tests on different servers connected to the same >>>> FC switch/storage as well as servers connected to different storage >>>> without seeing this behavior. >>>> >>> >>> Thanks for bringing this to attention. Is it possible to re-run the test >>> with extended logging (ql2xextended_error_logging=1). This will help to >>> debug further. >>> >>> -- Giri >>> >>> >>>> output to /var/log/messages: >>>> >>>> May 14 11:18:14 goofy kernel: qla2xxx 0000:06:01.0: scsi(2:2:15): LOOP >>>> RESET ISSUED. >>>> May 14 11:18:34 goofy kernel: rport-2:0-4: blocked FC remote port >>>> time out: removing target and saving binding >>>> May 14 11:18:34 goofy kernel: sd 2:0:2:1: [sdv] Synchronizing SCSI cache >>>> May 14 11:18:34 goofy multipathd: sdv: remove path (uevent) >>>> May 14 11:19:04 goofy kernel: rport-2:0-2: blocked FC remote port >>>> time out: removing target and saving binding >>>> May 14 11:19:04 goofy kernel: rport-2:0-3: blocked FC remote port >>>> time out: removing target and saving binding >>>> May 14 11:19:14 goofy kernel: rport-2:0-6: blocked FC remote port >>>> time out: removing rport >>>> May 14 11:20:22 goofy LifeKeeper[8115]: RECOVERY class=disk >>>> event=inqfail name=3600508b300902930a83a23d1a2850036 STARTING AT: Fri >>>> May 14 11:20:22 EDT 2010 >>>> May 14 11:20:23 goofy LifeKeeper[8115]: RECOVERY class=disk >>>> event=inqfail name=3600508b300902930a83a23d1a2850036 ENDING AT: Fri >>>> May 14 11:20:23 EDT 2010 >>>> May 14 11:23:14 goofy kernel: qla2xxx 0000:06:01.0: >>>> qla2xxx_eh_bus_reset: reset succeded >>>> >>>> >>>> What is causing this time out and how can I avoid it? >>>> >>>> Eddie Williams >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in >>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html