Re: qla2xxx - blocked FC remote port time out: removing target and saving binding

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Tue, 18 May 2010, Eddie Williams wrote:

Here is a snipit of the log with the timestamp cut off to avoid making
the mail so large.  I can send you the full log directly if you like.

driver load:
QLogic Fibre Channel HBA Driver: 8.03.01-k10-debug
qla2xxx 0000:06:01.0: PCI INT A -> GSI 74 (level, low) -> IRQ 74
qla2xxx 0000:06:01.0: Found an ISP2312, irq 74, iobase 0xffffc90000dba000
qla2xxx 0000:06:01.0: Configuring PCI space...
qla2xxx 0000:06:01.0: Configure NVRAM parameters...
qla2xxx 0000:06:01.0: Verifying loaded RISC code...
scsi(4): **** Load RISC code ****
qla2xxx 0000:06:01.0: firmware: requesting ql2300_fw.bin
scsi(4): Verifying Checksum of loaded RISC code.
scsi(4): Checksum OK, start firmware.
qla2xxx 0000:06:01.0: Allocated (412 KB) for firmware dump...
scsi(4): Issue init firmware.
DEBUG: detect hba 4 at address = ffff88003bf90000
scsi4 : qla2xxx
scsi(4): qla2x00_loop_resync()
scsi(4): Asynchronous P2P MODE received.
scsi(4): Asynchronous LOOP UP (2 Gbps).
qla2xxx 0000:06:01.0: LOOP UP detected (2 Gbps).
scsi(4): Asynchronous PORT UPDATE.
scsi(4): Port database changed ffff 0006 0000.
scsi(4): F/W Ready - OK
scsi(4): fw_state=3 (3bb1, 8800, ffff, e540) curr time=10031b431.
scsi(4): Configure loop -- dpc flags =0x112e1
scsi(4): RSCN queue entry[0] = [00/000000].
scsi(4): device_resync: rscn overflow.
qla2xxx 0000:06:01.0:
QLogic Fibre Channel HBA Driver: 8.03.01-k10-debug
 QLogic QLA2340 -
 ISP2312: PCI-X (100 MHz) @ 0000:06:01.0 hdma-, host#=4, fw=3.03.27 IPX
qla2xxx 0000:06:02.0: PCI INT A -> GSI 78 (level, low) -> IRQ 78
qla2xxx 0000:06:02.0: Found an ISP2312, irq 78, iobase 0xffffc900010e4000
qla2xxx 0000:06:02.0: Configuring PCI space...
qla2xxx 0000:06:02.0: Configure NVRAM parameters...
qla2xxx 0000:06:02.0: Verifying loaded RISC code...
scsi(5): **** Load RISC code ****
scsi(5): Verifying Checksum of loaded RISC code.
scsi(5): Checksum OK, start firmware.
qla2xxx 0000:06:02.0: Allocated (412 KB) for firmware dump...
scsi(5): Issue init firmware.
DEBUG: detect hba 5 at address = ffff88003b000000
scsi5 : qla2xxx
scsi(5): qla2x00_loop_resync()
qla2xxx 0000:06:02.0:
QLogic Fibre Channel HBA Driver: 8.03.01-k10-debug
 QLogic QLA2340 -
 ISP2312: PCI-X (100 MHz) @ 0000:06:02.0 hdma-, host#=5, fw=3.03.27 IPX
scsi(4): RHBA failed, completion status (6).
scsi(4): RFT_ID exiting normally.
scsi(4): RFF_ID failed, rejected request:
0   1   2   3   4   5   6   7   8   9  Ah  Bh  Ch  Dh  Eh  Fh
--------------------------------------------------------------
01  00  00  00  fc  02  00  00  80  01  00  00  00  0b  00  00
scsi(4): Register FC-4 Features failed.
scsi(4): RNN_ID exiting normally.
scsi(4): RSNN_NN exiting normally.
scsi(4): GID_PT entry - nn 200000e08b07ae8f pn 210000e08b07ae8f portid=610913.
scsi(4): GID_PT entry - nn 200000e08b17a9df pn 210000e08b17a9df portid=610813.
scsi(4): GID_PT entry - nn 20000000c9294b5b pn 10000000c9294b5b portid=610413.
scsi(4): GID_PT entry - nn 50001fe150004d10 pn 50001fe150004d19 portid=610613.
scsi(4): GID_PT entry - nn 50001fe150004d10 pn 50001fe150004d1d portid=610513.
scsi(4): GID_PT entry - nn 500508b300902930 pn 500508b300902939 portid=610713.
scsi(4): device wrap (610713)
scsi(4): LOOP READY
scsi(4): qla2x00_loop_resync - end
scsi(4:2): Async-login - loop-id=81 portid=610813 retries=0.
scsi(4:3): Async-login - loop-id=82 portid=610413 retries=0.
scsi(4:2): Async-login failed - status=30 mb0=4007 mb1=0 mb2=0 mb6=0 mb7=0.
scsi(4:3): Async-login failed - status=30 mb0=4007 mb1=1 mb2=0 mb6=0 mb7=0.
scsi(4:4): Async-login - loop-id=83 portid=610613 retries=0.
scsi(4:5): Async-login - loop-id=84 portid=610513 retries=0.
scsi(4:6): Async-login - loop-id=85 portid=610713 retries=0.
scsi(4): qla2x00_reset_marker()
scsi(4:4): Async-login complete - mbx1=4.
scsi(4:5): Async-login complete - mbx1=4.
scsi(4:6): Async-login complete - mbx1=4.
scsi(4:8): Async-login - loop-id=0 portid=610813 retries=0.
scsi(4:9): Async-login - loop-id=1 portid=610413 retries=0.
scsi(4:8): Async-login failed - status=30 mb0=4000 mb1=3 mb2=0 mb6=0 mb7=0.
scsi 4:0:0:0: RAID              COMPAQ   HSV111 (C)COMPAQ 4004 PQ: 0 ANSI: 5



reset:


qla2xxx 0000:06:01.0: scsi(4:2:13): BUS RESET ISSUED.
scsi(4): Asynchronous P2P MODE received.
scsi(4): Asynchronous PORT UPDATE.
scsi(4): Port database changed ffff 0006 0000.
scsi(4): Asynchronous PORT UPDATE ignored 0001/0006/0000.
scsi(4): Asynchronous PORT UPDATE ignored 0001/0007/0000.
scsi(4): fcport-0 - port retry count: 15 remaining
scsi(4): fcport-1 - port retry count: 15 remaining
scsi(4): fcport-2 - port retry count: 15 remaining
scsi(4): qla2x00_reset_marker()
scsi(4): qla2x00_loop_resync()
scsi(4): F/W Ready - OK
scsi(4): fw_state=3 (3bb1, 8800, ffff, e540) curr time=10032f088.
scsi(4): Configure loop -- dpc flags =0x1260
scsi(4): RSCN queue entry[0] = [00/000000].
scsi(4): device_resync: rscn overflow.
scsi(4): fcport-0 - port retry count: 14 remaining
scsi(4): fcport-1 - port retry count: 14 remaining
scsi(4): fcport-2 - port retry count: 14 remaining
scsi(4): fcport-0 - port retry count: 13 remaining
scsi(4): fcport-1 - port retry count: 13 remaining
scsi(4): fcport-2 - port retry count: 13 remaining
scsi(4): fcport-0 - port retry count: 12 remaining
scsi(4): fcport-1 - port retry count: 12 remaining
scsi(4): fcport-2 - port retry count: 12 remaining
scsi(4): fcport-0 - port retry count: 11 remaining
scsi(4): fcport-1 - port retry count: 11 remaining
scsi(4): fcport-2 - port retry count: 11 remaining
scsi(4): fcport-0 - port retry count: 10 remaining
scsi(4): fcport-1 - port retry count: 10 remaining
scsi(4): fcport-2 - port retry count: 10 remaining
scsi(4): fcport-0 - port retry count: 9 remaining
scsi(4): fcport-1 - port retry count: 9 remaining
scsi(4): fcport-2 - port retry count: 9 remaining
scsi(4): fcport-0 - port retry count: 8 remaining
scsi(4): fcport-1 - port retry count: 8 remaining
scsi(4): fcport-2 - port retry count: 8 remaining
scsi(4): fcport-0 - port retry count: 7 remaining
scsi(4): fcport-1 - port retry count: 7 remaining
scsi(4): fcport-2 - port retry count: 7 remaining
scsi(4): fcport-0 - port retry count: 6 remaining
scsi(4): fcport-1 - port retry count: 6 remaining
scsi(4): fcport-2 - port retry count: 6 remaining
scsi(4): fcport-0 - port retry count: 5 remaining
scsi(4): fcport-1 - port retry count: 5 remaining
scsi(4): fcport-2 - port retry count: 5 remaining
scsi(4): fcport-0 - port retry count: 4 remaining
scsi(4): fcport-1 - port retry count: 4 remaining
scsi(4): fcport-2 - port retry count: 4 remaining
scsi(4): fcport-0 - port retry count: 3 remaining
scsi(4): fcport-1 - port retry count: 3 remaining
scsi(4): fcport-2 - port retry count: 3 remaining
scsi(4): fcport-0 - port retry count: 2 remaining
scsi(4): fcport-1 - port retry count: 2 remaining
scsi(4): fcport-2 - port retry count: 2 remaining
scsi(4): fcport-0 - port retry count: 1 remaining
scsi(4): fcport-1 - port retry count: 1 remaining
scsi(4): fcport-2 - port retry count: 1 remaining
scsi(4): fcport-0 - port retry count: 0 remaining
scsi(4): fcport-1 - port retry count: 0 remaining
scsi(4): fcport-2 - port retry count: 0 remaining
scsi(4): RHBA failed, completion status (6).
qla2x00_mailbox_command(4): **** FAILED. mbx0=4005, mbx1=82, mbx2=0, cmd=71 ****
qla2x00_fabric_logout(4): failed=102 mbx1=82.
scsi(4): RFT_ID exiting normally.
qla2x00_mailbox_command(4): **** FAILED. mbx0=4005, mbx1=83,
mbx2=ffff, cmd=71 ****
qla2x00_fabric_logout(4): failed=102 mbx1=83.
rport-4:0-2: blocked FC remote port time out: removing target and
saving binding
scsi(4): RFF_ID failed, rejected request:
0   1   2   3   4   5   6   7   8   9  Ah  Bh  Ch  Dh  Eh  Fh
--------------------------------------------------------------
01  00  00  00  fc  02  00  00  80  01  00  00  00  0b  00  00
scsi(4): Register FC-4 Features failed.
qla2x00_mailbox_command(4): **** FAILED. mbx0=4005, mbx1=84, mbx2=0, cmd=71 ****
qla2x00_fabric_logout(4): failed=102 mbx1=84.
scsi(4): RNN_ID exiting normally.
scsi(4): RSNN_NN exiting normally.
sd 4:0:2:1: [sdk] Synchronizing SCSI cache
hd: sdk: remove path (uevent)
scsi(4): GID_PT entry - nn 200000e08b07ae8f pn 210000e08b07ae8f portid=610913.
scsi(4): GID_PT entry - nn 200000e08b17a9df pn 210000e08b17a9df portid=610813.
scsi(4): GID_PT entry - nn 20000000c9294b5b pn 10000000c9294b5b portid=610413.
scsi(4): GID_PT entry - nn 50001fe150004d10 pn 50001fe150004d19 portid=610613.
scsi(4): GID_PT entry - nn 50001fe150004d10 pn 50001fe150004d1d portid=610513.
scsi(4): GID_PT entry - nn 500508b300902930 pn 500508b300902939 portid=610713.
qla2x00_mailbox_command(4): **** FAILED. mbx0=4005, mbx1=82,
mbx2=8800, cmd=71 ****
qla2x00_fabric_logout(4): failed=102 mbx1=82.
qla2x00_mailbox_command(4): **** FAILED. mbx0=4005, mbx1=83,
mbx2=8800, cmd=71 ****
qla2x00_fabric_logout(4): failed=102 mbx1=83.
qla2x00_mailbox_command(4): **** FAILED. mbx0=4005, mbx1=84,
mbx2=8800, cmd=71 ****
qla2x00_fabric_logout(4): failed=102 mbx1=84.
scsi(4): device wrap (610713)
scsi(4): LOOP READY
scsi(4): qla2x00_loop_resync - end
qla2x00_mailbox_command(4): **** FAILED. mbx0=4006, mbx1=82, mbx2=5, cmd=17 ****
qla2x00_abort_target(4): failed=102.
qla2x00_loop_reset(4): bus_reset failed: target_reset=258 d_id=610613.
qla2x00_mailbox_command(4): **** FAILED. mbx0=4006, mbx1=83, mbx2=5, cmd=17 ****
qla2x00_abort_target(4): failed=102.
qla2x00_loop_reset(4): bus_reset failed: target_reset=258 d_id=610513.
qla2x00_mailbox_command(4): **** FAILED. mbx0=4006, mbx1=84, mbx2=5, cmd=17 ****
qla2x00_abort_target(4): failed=102.
qla2x00_loop_reset(4): bus_reset failed: target_reset=258 d_id=610713.
qla2xxx 0000:06:01.0: qla2xxx_eh_bus_reset: reset succeded
scsi(4:2a9): Async-login - loop-id=81 portid=610813 retries=0.
scsi(4:2aa): Async-login - loop-id=1 portid=610413 retries=0.
scsi(4:2ab): Async-login - loop-id=82 portid=610613 retries=0.
scsi(4:2ac): Async-login - loop-id=83 portid=610513 retries=0.
scsi(4:2ad): Async-login - loop-id=84 portid=610713 retries=0.
scsi(4:2a9): Async-login failed - status=30 mb0=4000 mb1=3 mb2=0 mb6=0 mb7=0.
scsi(4:2aa): Async-login complete - mbx1=3.
scsi(4): Port login retry: 210000e08b17a9df, id = 0x0081 retry cnt=16
scsi(4:2ab): Async-login complete - mbx1=4.
scsi(4:2ac): Async-login complete - mbx1=4.
scsi(4:2ad): Async-login complete - mbx1=4.
sd 4:0:2:1: [sdk] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
sd 4:0:2:7: [sdq] Synchronizing SCSI cache

Recently, we found an issue with async login stuff with 23xx adapters and decided to disbale this feature for timebeing on 23xx to debug further. Can u please try following patches and let me know whether this fix the problem. I am in process of queuing these patches to linux-scsi shortly.

-- Giri


From c40c40cab7ffa8c0710a7fba9fba572df9f38e83 Mon Sep 17 00:00:00 2001
From: Giridhar Malavali <giridhar.malavali@xxxxxxxxxx>
Date: Sun, 9 May 2010 19:55:41 -0700
Subject: [PATCH 7/9] qla2xxx: Limit rport-flaps during link-disruptions.

Signed-off-by: Andrew Vasquez <andrew.vasquez@xxxxxxxxxx>
---
 drivers/scsi/qla2xxx/qla_init.c |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
index f546839..b33e6c0 100644
--- a/drivers/scsi/qla2xxx/qla_init.c
+++ b/drivers/scsi/qla2xxx/qla_init.c
@@ -2651,7 +2651,8 @@ qla2x00_configure_loop(scsi_qla_host_t *vha)
 			set_bit(LOCAL_LOOP_UPDATE, &vha->dpc_flags);
 		if (test_bit(RSCN_UPDATE, &save_flags)) {
 			set_bit(RSCN_UPDATE, &vha->dpc_flags);
-			vha->flags.rscn_queue_overflow = 1;
+			if (!IS_ALOGIO_CAPABLE(ha))
+				vha->flags.rscn_queue_overflow = 1;
 		}
 	}

@@ -3209,8 +3210,9 @@ qla2x00_find_all_fabric_devs(scsi_qla_host_t *vha,
 		if (qla2x00_is_reserved_id(vha, loop_id))
 			continue;

-		if (atomic_read(&vha->loop_down_timer) ||
-		    LOOP_TRANSITION(vha)) {
+		if (ha->current_topology == ISP_CFG_FL &&
+		    (atomic_read(&vha->loop_down_timer) ||
+		     LOOP_TRANSITION(vha))) {
 			atomic_set(&vha->loop_down_timer, 0);
 			set_bit(LOOP_RESYNC_NEEDED, &vha->dpc_flags);
 			set_bit(LOCAL_LOOP_UPDATE, &vha->dpc_flags);
--
1.6.0.2


From 4e449325ee1268c1e62e10287e810111f0edb57d Mon Sep 17 00:00:00 2001
From: Andrew Vasquez <andrew.vasquez@xxxxxxxxxx>
Date: Fri, 14 May 2010 18:14:12 -0700
Subject: [PATCH 8/9] qla2xxx: Disable async-logins on ISP23xx.

Temporary workaround while additional investigation done by
the firmware group.

Can be re-enabled with the ql2xasyncenable module parameter set
to 1.
---
 drivers/scsi/qla2xxx/qla_def.h |    3 ++-
 drivers/scsi/qla2xxx/qla_gbl.h |    1 +
 drivers/scsi/qla2xxx/qla_os.c  |    6 ++++++
 3 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h
index f8239bf..3ff3f5f 100644
--- a/drivers/scsi/qla2xxx/qla_def.h
+++ b/drivers/scsi/qla2xxx/qla_def.h
@@ -2542,7 +2542,8 @@ struct qla_hw_data {
 				(ha)->flags.msix_enabled)
 #define IS_FAC_REQUIRED(ha)	(IS_QLA81XX(ha))
 #define IS_NOCACHE_VPD_TYPE(ha)	(IS_QLA81XX(ha))
-#define IS_ALOGIO_CAPABLE(ha)	(IS_QLA23XX(ha) || IS_FWI2_CAPABLE(ha))
+#define IS_ALOGIO_CAPABLE(ha)	((IS_QLA23XX(ha) && ql2xasyncenable) || \
+				IS_FWI2_CAPABLE(ha))

 #define IS_IIDMA_CAPABLE(ha)    ((ha)->device_type & DT_IIDMA)
 #define IS_FWI2_CAPABLE(ha)     ((ha)->device_type & DT_FWI2)
diff --git a/drivers/scsi/qla2xxx/qla_gbl.h b/drivers/scsi/qla2xxx/qla_gbl.h
index 2d3c540..2161e09 100644
--- a/drivers/scsi/qla2xxx/qla_gbl.h
+++ b/drivers/scsi/qla2xxx/qla_gbl.h
@@ -98,6 +98,7 @@ extern int ql2xenabledif;
 extern int ql2xenablehba_err_chk;
 extern int ql2xtargetreset;
 extern int ql2xfwloadalways;
+extern int ql2xasyncenable;

 extern int qla2x00_loop_reset(scsi_qla_host_t *);
 extern void qla2x00_abort_all_cmds(scsi_qla_host_t *, int);
diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index 7400456..cb76059 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -31,6 +31,12 @@ static int apidev_major;
  */
 static struct kmem_cache *srb_cachep;

+int ql2xasyncenable;
+module_param(ql2xasyncenable, int, S_IRUGO|S_IRUSR);
+MODULE_PARM_DESC(ql2xasyncenable,
+		"Enables usage of Async-login/adisc on ISP23xx."
+		"Default is 0 - use serialized mailbox commands.");
+
 /*
  * CT6 CTX allocation cache
  */
--
1.6.0.2





On Tue, May 18, 2010 at 12:56 PM, Giridhar Malavali
<giridhar.malavali@xxxxxxxxxx> wrote:


On Tue, 18 May 2010, Eddie Williams wrote:

I upgraded an older EM64T server with 2 Qlogic controllers (QLA2340
FW:v3.03.27 DVR:v8.03.01-k10) to SLES 11 and when a SCSI Bus Reset
(Loop Reset) is issued I get this time out where the driver ends up
removing all devices (causing IO failures for mounted file systems and
the like) and then rescaning to add the devices back.  I verified that
with the latest stable kernel (2.6.33.4) I see similar behavior
(though the new kernel recovery much quicker).  If I go back to an
older boot disk with Red Hat EL 4 that kernel/driver will issue a
reset without any issues, aka no port time out or loss of connection
to the storage.

I have tried similar tests on different servers connected to the same
FC switch/storage as well as servers connected to different storage
without seeing this behavior.


Thanks for bringing this to attention. Is it possible to re-run the test
with extended logging (ql2xextended_error_logging=1). This will help to
debug further.

-- Giri


output to /var/log/messages:

May 14 11:18:14 goofy kernel: qla2xxx 0000:06:01.0: scsi(2:2:15): LOOP
RESET ISSUED.
May 14 11:18:34 goofy kernel:  rport-2:0-4: blocked FC remote port
time out: removing target and saving binding
May 14 11:18:34 goofy kernel: sd 2:0:2:1: [sdv] Synchronizing SCSI cache
May 14 11:18:34 goofy multipathd: sdv: remove path (uevent)
May 14 11:19:04 goofy kernel:  rport-2:0-2: blocked FC remote port
time out: removing target and saving binding
May 14 11:19:04 goofy kernel:  rport-2:0-3: blocked FC remote port
time out: removing target and saving binding
May 14 11:19:14 goofy kernel:  rport-2:0-6: blocked FC remote port
time out: removing rport
May 14 11:20:22 goofy LifeKeeper[8115]: RECOVERY class=disk
event=inqfail name=3600508b300902930a83a23d1a2850036 STARTING AT: Fri
May 14 11:20:22 EDT 2010
May 14 11:20:23 goofy LifeKeeper[8115]: RECOVERY class=disk
event=inqfail name=3600508b300902930a83a23d1a2850036 ENDING AT: Fri
May 14 11:20:23 EDT 2010
May 14 11:23:14 goofy kernel: qla2xxx 0000:06:01.0:
qla2xxx_eh_bus_reset: reset succeded


What is causing this time out and how can I avoid it?

Eddie Williams
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux