On Wed, 2016-02-10 at 22:53 -0800, Nicholas A. Bellinger wrote: > On Tue, 2016-02-09 at 18:03 +0000, Himanshu Madhani wrote: > > On 2/8/16, 9:25 PM, "Nicholas A. Bellinger" <nab@xxxxxxxxxxxxxxx> wrote: > > >On Mon, 2016-02-08 at 23:27 +0000, Himanshu Madhani wrote: > > >> > > >> I am testing this series with with 4.5.0-rc2+ kernel and I am seeing > > >>issue > > >> where trying to trigger > > >> sg_reset with option of host/device/bus in loop at 120second interval > > >> causes call stack. At this point > > >> removing configuration hangs indefinitely. See attached dmesg output > > >>from > > >> my setup. > > >> > > > > > >Thanks alot for testing this. > > > > > >So It looks like we're still hitting a indefinite schedule() on > > >se_cmd->cmd_wait_comp once tcm_qla2xxx session disconnect/reconnect > > >occurs, after repeated explicit active I/O remote-port sg_resets. > > > > > >Does this trigger on the first tcm_qla2xxx session reconnect after > > >explicit remote-port sg_reset..? Are session reconnects actively being > > >triggered during the test..? > > > > > >To verify the latter for iscsi-target, I've been using a small patch to > > >trigger session reset from TMR kthread context in order to simulate the > > >I_T disconnects. Something like that would be useful for verifying with > > >tcm_qla2xxx too. > > > > > >That said, I'll be reproducing with tcm_qla2xxx ports this week, and > > >will enable various debug in a WIP branch for testing. > > Following up here.. > > So far using my test setup with ISP2532 ports in P2P + RAMDISK_MCP and > v4.5-rc1, repeated remote-port active I/O LUN_RESET (sg_reset -d) has > been functioning as expected with a blocksize_range=4k-256k + iodepth=32 > fio write-verify style workload. > > No ->cmd_kref -1 OOPsen or qla2xxx initiator generated ABORT_TASKs from > outstanding target TAS responses, nor fio write-verify failures to > report after 800x remote-port active I/O LUN_RESETS. > > Next step will be to verify explicit tcm_qla2xxx port + module shutdown > after 1K test iterations, and then IBLOCK async completions <-> NVMe > backends with the same case. > After letting this test run over-night up to 7k active I/O remote-port LUN_RESETs, things are still functioning as expected. Also, /etc/init.d/target stop was able to successfully shutdown all active sessions and unload tcm_qla2xxx after the test run. So AFAICT, the active I/O remote-port LUN_RESET changes are stable with tcm_qla2xxx ports, separate from concurrent session disconnect hung task you reported earlier. That said, I'll likely push this series as-is for -rc4, given that Dan has also been able to verify the non conncurrent session disconnect case on his setup generating constant ABORT_TASKs, and it's still surviving both cases for iscsi-target ports. Please give the debug patch from last night a shot, and see if we can determine the se_cmd states when you hit the hung task. Thank you, -nab -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html