Hey Everyone,
In a basic SAN config using a few QLE2432 adapters connected to a target
using a QLE2464 adapter via a EMC Brocade 5000B switch, one of the
initiators fails to present a LUN after being online for sometime. Over
a period of time, few months, the initiator on one of the hosts stops
showing available LUN's to the underlying VMware clients.
All the other hosts are fine, except for this specific one. Unless that
is, I reboot the target completely, affecting all the other working
hosts in the process.
Digging a bit closer, I notice that the issue seems to strike an uncanny
similarity to the following one:
ht tps://www.spinics.net/lists/linux-scsi/msg136622.html
However, I'm wondering why only one of the servers is affected and not
the others? Seems it is a card issue with the first host (please see
image) however I'm not familiar with all the messages printed so can't
be sure of the reason nor link things as effectively. Neither of the two
ports of the HBA on Server 1 work, when in this disconnected state,
despite switching SFP's on the Brocade switch, switching cables etc.
That is, again, until I reboot the target entirely.
Initiator:
50:01:43:80:16:77:99:38; 50:01:43:80:16:77:99:3a;
50:01:43:80:16:77:99:3b; 50:01:43:80:16:77:99:39
Target:
(see below)
This also started to happen a few months ago. Everything was fine for a
few years before that.
Log snippet:
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-e818:
is_send_status=1, cmd->bufflen=73728, cmd->sg_cnt=0,
cmd->dma_data_direction=2 se_cmd[00000000cc1dc466] qp 0
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-e874:2:
qlt_free_cmd: se_cmd[00000000cc1dc466] ox_id 00e1
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-e872:2:
qlt_24xx_atio_pkt_all_vps: qla_target(0): type 6 ox_id 0110
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-e818:
is_send_status=1, cmd->bufflen=10240, cmd->sg_cnt=0,
cmd->dma_data_direction=2 se_cmd[00000000cc1dc466] qp 0
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-e874:2:
qlt_free_cmd: se_cmd[00000000cc1dc466] ox_id 0110
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-e872:2:
qlt_24xx_atio_pkt_all_vps: qla_target(0): type d ox_id 0000
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-e82e:2:
IMMED_NOTIFY ATIO
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-f826:2:
qla_target(0): Port ID: 01:09:00 ELS opcode: 0x03 lid 7
50:01:43:80:16:77:99:3a
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-f897:2: Linking
sess 00000000088a6dfa [0] wwn 50:01:43:80:16:77:99:3a with PLOGI ACK to
wwn 50:01:43:80:16:77:99:3a s_id 01:09:00, ref=1 pla 000000004bdf1d76 link 0
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-28f9:2:
qlt_handle_login 4772 50:01:43:80:16:77:99:3a DS 8
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-28f9:2:
qlt_handle_login 4803 50:01:43:80:16:77:99:3a post del sess
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-e801:2:
Scheduling sess 00000000088a6dfa for deletion
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-f826:2:
qla_target(0): Exit ELS opcode: 0x03 res 0
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-290a:2:
qlt_unreg_sess sess 00000000088a6dfa for deletion 50:01:43:80:16:77:99:3a
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-287d:2: FCPort
50:01:43:80:16:77:99:3a state transitioned from ONLINE to LOST -
portid=010900.
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-28a3:2: Port
login retry 500143801677993a, lid 0x0007 retry cnt=45.
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-f884:2:
qlt_free_session_done: se_sess 000000001f2eac78 / sess 00000000088a6dfa
from port 50:01:43:80:16:77:99:3a loop_id 0x07 s_id 01:09:00 logout 1
keep 1 els_logo 0
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-f886:2:
qlt_free_session_done: waiting for sess 00000000088a6dfa logout
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-2870:2:
Async-logout - hdl=172 loop-id=7 portid=010900 50:01:43:80:16:77:99:3a.
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-5836:2:
Async-logout complete - 50:01:43:80:16:77:99:3a hdl=172 portid=010900
iop0=0.
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-f893:2:
qlt_logo_completion_handler: se_sess 000000001f2eac78 / sess
00000000088a6dfa from port 50:01:43:80:16:77:99:3a loop_id 0x07 s_id
01:09:00 LOGO failed: 0x0
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-e872:2:
qlt_24xx_atio_pkt_all_vps: qla_target(0): type 6 ox_id 008a
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-e818:
is_send_status=1, cmd->bufflen=4096, cmd->sg_cnt=0,
cmd->dma_data_direction=2 se_cmd[00000000cc1dc466] qp 0
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-e874:2:
qlt_free_cmd: se_cmd[00000000cc1dc466] ox_id 008a
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-680b:2:
isp_abort_needed=0 loop_resync_needed=0 fcport_update_needed=0
start_dpc=0 reset_marker_needed=0
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-680c:2:
beacon_blink_needed=0 isp_unrecoverable=0 fcoe_ctx_reset_needed=0
vp_dpc_needed=0 relogin_needed=1.
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-4801:2: DPC
handler waking up, dpc_flags=0x100.
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-480d:2: Relogin
scheduled.
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-4800:2: DPC
handler sleeping.
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-2908:2:
qla2x00_relogin 21:01:00:1b:32:a1:81:21 DS 0 LS 7
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-2902:2:
qla24xx_handle_relogin_event 21:01:00:1b:32:a1:81:21 DS 0 LS 7 P 0 del 2
cnfl (null) rscn 0|0 login 0|0 fl 3
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-28d8:2:
qla24xx_fcport_handle_login 21:01:00:1b:32:a1:81:21 DS 0 LS 7 P 0 fl 3
confl (null) rscn 0|0 login 0 retry 45 lid 4096 scan 2
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-2908:2:
qla2x00_relogin 50:01:43:80:16:77:99:38 DS 3 LS 4
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-2902:2:
qla24xx_handle_relogin_event 50:01:43:80:16:77:99:38 DS 3 LS 4 P 0 del 1
cnfl (null) rscn 0|0 login 4|18 fl 1
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-28d8:2:
qla24xx_fcport_handle_login 50:01:43:80:16:77:99:38 DS 3 LS 4 P 0 fl 1
confl (null) rscn 0|0 login 18 retry 45 lid 4 scan 2
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-2908:2:
qla2x00_relogin 50:01:43:80:16:77:99:3a DS 10 LS 3
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-2902:2:
qla24xx_handle_relogin_event 50:01:43:80:16:77:99:3a DS 10 LS 3 P 0 del
1 cnfl (null) rscn 0|0 login 8|9 fl 1
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-480e:2: Relogin end.
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-f887:2:
qlt_free_session_done: sess 00000000088a6dfa logout completed
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-f89a:2: se_sess
(null) / sess 00000000088a6dfa port 50:01:43:80:16:77:99:3a is
gone, releasing own PLOGI (ref=1)
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-5889:2: Sending
PLOGI ACK to wwn 50:01:43:80:16:77:99:3a s_id 01:09:00 loop_id 0x07 exch
0x11223c ox_id 0xae
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-f801:2:
Unregistration of sess 00000000088a6dfa 50:01:43:80:16:77:99:3a finished
fcp_cnt 4
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-28f4:2:
Async-nack 50:01:43:80:16:77:99:3a hndl 175 PLOGI
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-28f2:2: Async
done-nack res 0 50:01:43:80:16:77:99:3a type 16
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-5811:2:
Asynchronous PORT UPDATE ignored 0007/0004/0600.
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-f83d:2:
qla_target(0): Port update async event 0x8014 occurred: updating the
ports database (m[0]=8014, m[1]=7, m[2]=4, m[3]=600)
Mar 28 02:40:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-f83e:2: Async MB
2: Got PLOGI Complete
Thanks,
Seems similar to this issue:
ht tps://www.spinics.net/lists/linux-scsi/msg136470.html
Log link:
ht
tps://www.microdevsys.com/WordPressDownloads/qla2xxx-hba.log-recent.start.end.event.txt
Infra Setup (image):
ht tps://www.microdevsys.com/WordPressImages/brocade-san-issue.png
Adapter details on the target.
# QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA ^C
You have mail in /var/spool/mail/root
[root@mbpc-pc 1]# lspci -vvv -s 03:00.0
03:00.0 Fibre Channel: QLogic Corp. ISP2432-based 4Gb Fibre Channel to
PCI Express HBA (rev 03)
Subsystem: QLogic Corp. Device 0146
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr+ Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 18
Region 0: I/O ports at ee00 [size=256]
Region 1: Memory at fdffc000 (64-bit, non-prefetchable) [size=16K]
[virtual] Expansion ROM at fdf00000 [disabled] [size=256K]
Capabilities: [44] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [4c] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 1024 bytes, PhantFunc 0, Latency
L0s <4us, L1 <1us
ExtTag- AttnBtn+ AttnInd+ PwrInd+ RBE- FLReset-
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+
Unsupported+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 4096 bytes
DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+
AuxPwr- TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L0s,
Latency L0 <4us, L1 unlimited
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain-
CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train-
SlotClk+ DLActive- BWMgmt- ABWMgmt-
Capabilities: [64] MSI: Enable- Count=1/16 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [74] Vital Product Data
Product Name: PCI-Express Quad Channel 4Gb Fibre
Channel HBA
Read-only fields:
[PN] Part number: QLE2464
[SN] Serial number: GFC0840A74113
[V0] Vendor specific: PW=15W
[MN] Manufacture ID: 50 58 32 36 31 30 34 30
31 2d 31 31 20 20 41
[V1] Vendor specific: 06.12
[V3] Vendor specific: 08.01.02
[V4] Vendor specific: 03.29
[V5] Vendor specific: 03.23
[YA] Asset tag:
[RV] Reserved: checksum good, 0 byte(s) reserved
End
Capabilities: [7c] MSI-X: Enable- Count=16 Masked-
Vector table: BAR=1 offset=00002000
PBA: BAR=1 offset=00003000
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt-
UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout-
NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout-
NonFatalErr-
AERCap: First Error Pointer: 14, GenCap+ CGenEn-
ChkCap+ ChkEn-
Capabilities: [138 v1] Power Budgeting <?>
Kernel driver in use: qla2xxx
Target details:
[root@mbpc-pc 1]# systool -c fc_host -v
Class = "fc_host"
Class Device = "host0"
Class Device path =
"/sys/devices/pci0000:00/0000:00:02.0/0000:01:00.0/0000:02:00.0/0000:03:00.0/host0/fc_host/host0"
dev_loss_tmo = "45"
fabric_name = "0xffffffffffffffff"
issue_lip = <store method only>
max_npiv_vports = "127"
node_name = "0x2002001b32c18121"
npiv_vports_inuse = "0"
port_id = "0x000000"
port_name = "0x2102001b32c18121"
port_state = "Linkdown"
port_type = "Unknown"
speed = "unknown"
supported_classes = "Class 3"
supported_speeds = "1 Gbit, 2 Gbit, 4 Gbit"
symbolic_name = "QLE2464 FW:v8.06.02 DVR:v10.00.00.07-k-debug"
system_hostname = ""
tgtid_bind_type = "wwpn (World Wide Port Name)"
uevent =
vport_create = <store method only>
vport_delete = <store method only>
Device = "host0"
Device path =
"/sys/devices/pci0000:00/0000:00:02.0/0000:01:00.0/0000:02:00.0/0000:03:00.0/host0"
fw_dump =
issue_logo = <store method only>
nvram = "ISP "
optrom_ctl = <store method only>
optrom =
reset = <store method only>
sfp = ""
uevent = "DEVTYPE=scsi_host"
vpd = "▒."
Class Device = "host1"
Class Device path =
"/sys/devices/pci0000:00/0000:00:02.0/0000:01:00.0/0000:02:00.0/0000:03:00.1/host1/fc_host/host1"
dev_loss_tmo = "45"
fabric_name = "0xffffffffffffffff"
issue_lip = <store method only>
max_npiv_vports = "127"
node_name = "0x2003001b32e18121"
npiv_vports_inuse = "0"
port_id = "0x000000"
port_name = "0x2103001b32e18121"
port_state = "Linkdown"
port_type = "Unknown"
speed = "unknown"
supported_classes = "Class 3"
supported_speeds = "1 Gbit, 2 Gbit, 4 Gbit"
symbolic_name = "QLE2464 FW:v8.06.02 DVR:v10.00.00.07-k-debug"
system_hostname = ""
tgtid_bind_type = "wwpn (World Wide Port Name)"
uevent =
vport_create = <store method only>
vport_delete = <store method only>
Device = "host1"
Device path =
"/sys/devices/pci0000:00/0000:00:02.0/0000:01:00.0/0000:02:00.0/0000:03:00.1/host1"
fw_dump =
issue_logo = <store method only>
nvram = "ISP "
optrom_ctl = <store method only>
optrom =
reset = <store method only>
sfp = ""
uevent = "DEVTYPE=scsi_host"
vpd = "▒."
Class Device = "host2"
Class Device path =
"/sys/devices/pci0000:00/0000:00:02.0/0000:01:00.0/0000:02:01.0/0000:04:00.0/host2/fc_host/host2"
dev_loss_tmo = "45"
fabric_name = "0x100000051e903dab"
issue_lip = <store method only>
max_npiv_vports = "127"
node_name = "0x2000001b32818121"
npiv_vports_inuse = "0"
port_id = "0x011300"
port_name = "0x2100001b32818121"
port_state = "Online"
port_type = "NPort (fabric via point-to-point)"
speed = "4 Gbit"
supported_classes = "Class 3"
supported_speeds = "1 Gbit, 2 Gbit, 4 Gbit"
symbolic_name = "QLE2464 FW:v8.06.02 DVR:v10.00.00.07-k-debug"
system_hostname = ""
tgtid_bind_type = "wwpn (World Wide Port Name)"
uevent =
vport_create = <store method only>
vport_delete = <store method only>
Device = "host2"
Device path =
"/sys/devices/pci0000:00/0000:00:02.0/0000:01:00.0/0000:02:01.0/0000:04:00.0/host2"
fw_dump =
issue_logo = <store method only>
nvram = "ISP "
optrom_ctl = <store method only>
optrom =
reset = <store method only>
sfp = ""
uevent = "DEVTYPE=scsi_host"
vpd = "▒."
Class Device = "host3"
Class Device path =
"/sys/devices/pci0000:00/0000:00:02.0/0000:01:00.0/0000:02:01.0/0000:04:00.1/host3/fc_host/host3"
dev_loss_tmo = "45"
fabric_name = "0x100000051e903dab"
issue_lip = <store method only>
max_npiv_vports = "127"
node_name = "0x2001001b32a18121"
npiv_vports_inuse = "0"
port_id = "0x011700"
port_name = "0x2101001b32a18121"
port_state = "Online"
port_type = "NPort (fabric via point-to-point)"
speed = "4 Gbit"
supported_classes = "Class 3"
supported_speeds = "1 Gbit, 2 Gbit, 4 Gbit"
symbolic_name = "QLE2464 FW:v8.06.02 DVR:v10.00.00.07-k-debug"
system_hostname = ""
tgtid_bind_type = "wwpn (World Wide Port Name)"
uevent =
vport_create = <store method only>
vport_delete = <store method only>
Device = "host3"
Device path =
"/sys/devices/pci0000:00/0000:00:02.0/0000:01:00.0/0000:02:01.0/0000:04:00.1/host3"
fw_dump =
issue_logo = <store method only>
nvram = "ISP "
optrom_ctl = <store method only>
optrom =
reset = <store method only>
sfp = ""
uevent = "DEVTYPE=scsi_host"
vpd = "▒."
[root@mbpc-pc 1]#
--
Thx,
TK.