https://bugzilla.kernel.org/show_bug.cgi?id=214147 --- Comment #2 from michael.christie@xxxxxxxxxx --- On 8/23/21 6:08 AM, bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=214147 > > Bug ID: 214147 > Summary: ISCSI broken in last release > Product: IO/Storage > Version: 2.5 > Kernel Version: 5.13.12 > Hardware: All > OS: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: SCSI > Assignee: linux-scsi@xxxxxxxxxxxxxxx > Reporter: slavon.net@xxxxxxxxx > Regression: Yes > > Created attachment 298441 > --> https://bugzilla.kernel.org/attachment.cgi?id=298441&action=edit > dmesg log > > After some time iscsi go to broke and help only reboot > What are you doing when you hit the issue? What does your target setup look like? What are you using for the backing store? Are you able to build your own kernels? The only major changes between 5.12 and 5.13 is some target patches to batch cmds. However, it looks like you start to hit a problem earlier than when that code comes into play. We first see you hit a data out timeout, so we don't even have all the data for the cmd, so the target changes in 5.13 don't come into play yet. [10931.107057] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus iqn.1991-05.com.microsoft:vhost11.dev.obs.group,i,0x400001370002,iqn.2003-01.org.linux-iscsi.vm2.x8664:sn.b07943625401,t,0x01 However, we do see some cmds have made it to the core target layer because we can see the target layer is waiting on cmds to complete for part of the lun reset handling: [19906.593285] INFO: task kworker/4:1:3770999 blocked for more than 122 seconds. [19906.603670] Tainted: P O 5.13.12-1.el8.elrepo.x86_64 #1 [19906.613975] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [19906.624208] task:kworker/4:1 state:D stack: 0 pid:3770999 ppid: 2 flags:0x00004000 [19906.624212] Workqueue: events target_tmr_work [target_core_mod] [19906.624247] Call Trace: [19906.624249] __schedule+0x396/0x8a0 [19906.624252] schedule+0x3c/0xa0 [19906.624255] schedule_timeout+0x215/0x2b0 [19906.624258] ? kasprintf+0x4e/0x70 [19906.624261] wait_for_completion+0x9e/0x100 [19906.624264] target_put_cmd_and_wait+0x55/0x80 [target_core_mod] [19906.624279] core_tmr_lun_reset+0x38b/0x660 [target_core_mod] [19906.624294] target_tmr_work+0xb4/0x110 [target_core_mod] [19906.624309] process_one_work+0x230/0x3d0 [19906.624312] worker_thread+0x2d/0x3e0 [19906.624314] ? process_one_work+0x3d0/0x3d0 [19906.624316] kthread+0x118/0x140 [19906.624318] ? set_kthread_struct+0x40/0x40 [19906.624320] ret_from_fork+0x1f/0x30 and we can see iscsi layer not able to relogin because of outstanding cmds/tmfs. I can send you a patch that reverts the core target patches. If we can rule them out then it would help narrow things down. Or, because it sounds like this is easy to reproduce we can turn on some extra lio debugging. -- You may reply to this email to add a comment. You are receiving this mail because: You are the assignee for the bug.