Re: ESX FC host connectivity issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 2016-02-27 at 05:41 -0500, DAVID S wrote:
> Dan/All,
> 
> Sorry for the delay in responding to this, but I am indeed having
> similar issues. I haven't been able to consistently replicate the
> crashes, but I did have one today when I powered on every VM in my
> environment simultaneously, and I have pasted relevant errors found
> when running journalctl -p err. I believe the crash was at 04:49 on
> Feb 27.
> 
> Feb 26 17:20:59 storage.example.home kernel: qla2xxx
> [0000:06:00.0]-0121:9: Failed to enable receiving of RSCN requests: 0x2.
> Feb 26 17:21:00 storage.example.home kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02
> Feb 26 17:21:00 storage.example.home kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02
> Feb 26 17:21:00 storage.example.home kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02
> Feb 26 17:21:06 storage.example.home kernel: qla2xxx
> [0000:06:00.1]-0121:10: Failed to enable receiving of RSCN requests: 0x2.
> Feb 26 17:51:09 storage.example.home kernel: qla2xxx
> [0000:06:00.1]-0121:10: Failed to enable receiving of RSCN requests: 0x2.
> Feb 26 17:51:32 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi.
> Feb 26 17:51:52 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi.
> Feb 26 17:52:17 storage.example.home kernel: qla2xxx [0000:06:00.0]-505e:9: Link is offline.
> Feb 26 17:52:31 storage.example.home kernel: qla2xxx [0000:06:00.0]-505e:9: Link is offline.
> Feb 26 17:52:51 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi.
> Feb 26 17:53:11 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi.
> Feb 26 17:53:31 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi.
> Feb 26 17:53:51 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi.
> Feb 26 17:54:11 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi.
> Feb 26 17:54:32 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi.
> Feb 26 17:54:52 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi.
> Feb 26 17:55:12 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi.
> Feb 26 17:55:32 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi.
> Feb 26 18:45:35 storage.example.home kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02
> Feb 26 18:45:35 storage.example.home kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02
> Feb 26 19:12:19 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi.
> Feb 26 19:12:25 storage.example.home kernel: qla2xxx [0000:06:00.1]-0121:10: Failed to enable receiving of RSCN requests: 0x2.
> Feb 26 19:12:39 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi.
> Feb 26 19:13:00 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi.
> Feb 26 19:13:20 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi.
> Feb 26 19:13:40 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi.
> Feb 26 19:14:00 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi.
> Feb 26 19:14:20 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi.
> Feb 26 19:14:40 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi.
> Feb 26 19:15:01 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi.
> Feb 26 19:17:00 storage.example.home kernel: qla2xxx [0000:06:00.1]-f095:10: sess ffff880224459240 PRLI received, before plogi ack.
> Feb 26 19:24:18 storage.example.home systemd[1]: Failed unmounting Configuration File System.
> Feb 26 19:24:18 storage.example.home systemd[1]: Failed unmounting /var. Feb 26 19:24:18 storage.example.home kernel: watchdog watchdog0: watchdog did not stop!
> -- Reboot --
> Feb 26 14:25:36 storage.example.home kernel: ERST: Can not request [mem 0xcff69000-0xcff69fff] for ERST.
> Feb 26 19:25:39 storage.example.home kernel: kvm: disabled by bios
> Feb 26 19:25:39 storage.example.home kernel: kvm: disabled by bios
> Feb 26 19:31:04 storage.example.home kernel: qla2xxx [0000:06:00.0]-0121:9: Failed to enable receiving of RSCN requests: 0x2.
> Feb 26 19:31:31 storage.example.home kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02
> Feb 26 19:31:31 storage.example.home kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02
> Feb 26 19:31:31 storage.example.home kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02
> Feb 26 20:10:38 storage.example.home kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02
> Feb 26 20:10:38 storage.example.home kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02
> Feb 26 20:10:39 storage.example.home kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02
> Feb 26 20:24:57 storage.example.home kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02
> Feb 26 20:24:57 storage.example.home kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02
> Feb 26 20:24:57 storage.example.home kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02
> -- Reboot --
> Feb 26 23:55:55 storage.example.home kernel: ERST: Can not request [mem 0xcff69000-0xcff69fff] for ERST.
> -- Reboot --
> Feb 27 04:49:20 storage.example.home kernel: kernel BUG at drivers/scsi/qla2xxx/qla_target.c:3099!
> -- Reboot --
> Feb 27 04:55:58 storage.example.home kernel: kvm: disabled by bios
> Feb 27 04:55:58 storage.example.home kernel: kvm: disabled by bios
> Feb 27 04:55:58 storage.example.home kernel: kvm: disabled by bios
> Feb 27 04:56:34 storage.example.home kernel: qla2xxx
> [0000:06:00.1]-0121:10: Failed to enable receiving of RSCN requests: 0x2.
> Feb 27 04:56:47 storage.example.home kernel: qla2xxx
> [0000:06:00.0]-0121:9: Failed to enable receiving of RSCN requests: 0x2.
> Feb 27 05:13:12 storage.example.home dbus[899]: Can't send to audit
> system: USER_AVC avc:  received policyload notice (seqno=2)
> 
> exe="/usr/bin/dbus-daemon" sauid=81 hostname=? addr=? terminal=?
> Feb 27 05:15:14 storage.example.home kernel: kernel BUG at
> drivers/scsi/qla2xxx/qla_target.c:3099!

This is the exact same bug that Dan hit originally, and has been
addressed in v4.5-rc4 and target-pending/4.4-stable code.

http://www.spinics.net/lists/target-devel/msg11843.html

Note these patches have not made it into the stable + distro kernels
yet, so you'll need to follow the instructions in the URL above to
checkout a kernel tree with them applied.

> 
> I did also see some things on a different mailing list where people
> referenced a firmware issue with the qla2xxx driver when being used as
> a target, but I'm not sure if that's relevant in this discussion (they
> said it only appears when the initiator is another linux machine
> running on kernel 4.1+).
> 
> Anyway, I'll keep an eye on this and try to keep better track of
> exactly when/why the crashes happen.
> 
> Please let me know if there's any other helpful/relevant information
> that I can provide to help pinpoint this issue.
> 

I'd also recommend disabling ATS heartbeat in ESX-5u2 and above, as it's
a well known ESX host bug that effects every target with VAAI enabled:

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2113956

--nab

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux