On Sat, 2016-02-27 at 05:41 -0500, DAVID S wrote: > Dan/All, > > Sorry for the delay in responding to this, but I am indeed having > similar issues. I haven't been able to consistently replicate the > crashes, but I did have one today when I powered on every VM in my > environment simultaneously, and I have pasted relevant errors found > when running journalctl -p err. I believe the crash was at 04:49 on > Feb 27. > > Feb 26 17:20:59 storage.example.home kernel: qla2xxx > [0000:06:00.0]-0121:9: Failed to enable receiving of RSCN requests: 0x2. > Feb 26 17:21:00 storage.example.home kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02 > Feb 26 17:21:00 storage.example.home kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02 > Feb 26 17:21:00 storage.example.home kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02 > Feb 26 17:21:06 storage.example.home kernel: qla2xxx > [0000:06:00.1]-0121:10: Failed to enable receiving of RSCN requests: 0x2. > Feb 26 17:51:09 storage.example.home kernel: qla2xxx > [0000:06:00.1]-0121:10: Failed to enable receiving of RSCN requests: 0x2. > Feb 26 17:51:32 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi. > Feb 26 17:51:52 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi. > Feb 26 17:52:17 storage.example.home kernel: qla2xxx [0000:06:00.0]-505e:9: Link is offline. > Feb 26 17:52:31 storage.example.home kernel: qla2xxx [0000:06:00.0]-505e:9: Link is offline. > Feb 26 17:52:51 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi. > Feb 26 17:53:11 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi. > Feb 26 17:53:31 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi. > Feb 26 17:53:51 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi. > Feb 26 17:54:11 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi. > Feb 26 17:54:32 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi. > Feb 26 17:54:52 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi. > Feb 26 17:55:12 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi. > Feb 26 17:55:32 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi. > Feb 26 18:45:35 storage.example.home kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02 > Feb 26 18:45:35 storage.example.home kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02 > Feb 26 19:12:19 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi. > Feb 26 19:12:25 storage.example.home kernel: qla2xxx [0000:06:00.1]-0121:10: Failed to enable receiving of RSCN requests: 0x2. > Feb 26 19:12:39 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi. > Feb 26 19:13:00 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi. > Feb 26 19:13:20 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi. > Feb 26 19:13:40 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi. > Feb 26 19:14:00 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi. > Feb 26 19:14:20 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi. > Feb 26 19:14:40 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi. > Feb 26 19:15:01 storage.example.home kernel: qla2xxx [0000:06:00.0]-f094:9: sess ffff8800ca9ca780 received double plogi. > Feb 26 19:17:00 storage.example.home kernel: qla2xxx [0000:06:00.1]-f095:10: sess ffff880224459240 PRLI received, before plogi ack. > Feb 26 19:24:18 storage.example.home systemd[1]: Failed unmounting Configuration File System. > Feb 26 19:24:18 storage.example.home systemd[1]: Failed unmounting /var. Feb 26 19:24:18 storage.example.home kernel: watchdog watchdog0: watchdog did not stop! > -- Reboot -- > Feb 26 14:25:36 storage.example.home kernel: ERST: Can not request [mem 0xcff69000-0xcff69fff] for ERST. > Feb 26 19:25:39 storage.example.home kernel: kvm: disabled by bios > Feb 26 19:25:39 storage.example.home kernel: kvm: disabled by bios > Feb 26 19:31:04 storage.example.home kernel: qla2xxx [0000:06:00.0]-0121:9: Failed to enable receiving of RSCN requests: 0x2. > Feb 26 19:31:31 storage.example.home kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02 > Feb 26 19:31:31 storage.example.home kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02 > Feb 26 19:31:31 storage.example.home kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02 > Feb 26 20:10:38 storage.example.home kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02 > Feb 26 20:10:38 storage.example.home kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02 > Feb 26 20:10:39 storage.example.home kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02 > Feb 26 20:24:57 storage.example.home kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02 > Feb 26 20:24:57 storage.example.home kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02 > Feb 26 20:24:57 storage.example.home kernel: MODE SENSE: unimplemented page/subpage: 0x1c/0x02 > -- Reboot -- > Feb 26 23:55:55 storage.example.home kernel: ERST: Can not request [mem 0xcff69000-0xcff69fff] for ERST. > -- Reboot -- > Feb 27 04:49:20 storage.example.home kernel: kernel BUG at drivers/scsi/qla2xxx/qla_target.c:3099! > -- Reboot -- > Feb 27 04:55:58 storage.example.home kernel: kvm: disabled by bios > Feb 27 04:55:58 storage.example.home kernel: kvm: disabled by bios > Feb 27 04:55:58 storage.example.home kernel: kvm: disabled by bios > Feb 27 04:56:34 storage.example.home kernel: qla2xxx > [0000:06:00.1]-0121:10: Failed to enable receiving of RSCN requests: 0x2. > Feb 27 04:56:47 storage.example.home kernel: qla2xxx > [0000:06:00.0]-0121:9: Failed to enable receiving of RSCN requests: 0x2. > Feb 27 05:13:12 storage.example.home dbus[899]: Can't send to audit > system: USER_AVC avc: received policyload notice (seqno=2) > > exe="/usr/bin/dbus-daemon" sauid=81 hostname=? addr=? terminal=? > Feb 27 05:15:14 storage.example.home kernel: kernel BUG at > drivers/scsi/qla2xxx/qla_target.c:3099! This is the exact same bug that Dan hit originally, and has been addressed in v4.5-rc4 and target-pending/4.4-stable code. http://www.spinics.net/lists/target-devel/msg11843.html Note these patches have not made it into the stable + distro kernels yet, so you'll need to follow the instructions in the URL above to checkout a kernel tree with them applied. > > I did also see some things on a different mailing list where people > referenced a firmware issue with the qla2xxx driver when being used as > a target, but I'm not sure if that's relevant in this discussion (they > said it only appears when the initiator is another linux machine > running on kernel 4.1+). > > Anyway, I'll keep an eye on this and try to keep better track of > exactly when/why the crashes happen. > > Please let me know if there's any other helpful/relevant information > that I can provide to help pinpoint this issue. > I'd also recommend disabling ATS heartbeat in ESX-5u2 and above, as it's a well known ESX host bug that effects every target with VAAI enabled: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2113956 --nab -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html