On Tue, 2017-02-14 at 16:25 -0800, Bart Van Assche wrote: > For several target drivers (e.g. ib_srpt and ib_isert) sleeping inside > transport_generic_free_cmd() causes RDMA completion processing to stall. > Hence only sleep inside this function if the (iSCSI) target driver > requires this. > > This patch avoids that messages similar to the following appear in the > kernel log: > > INFO: task kworker/u25:0:1013 blocked for more than 480 seconds. > Tainted: G W 4.10.0-rc7-dbg+ #3 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > kworker/u25:0 D 0 1013 2 0x00000000 > Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] > Call Trace: > __schedule+0x2da/0xb00 > schedule+0x38/0x90 > schedule_timeout+0x2fe/0x640 > wait_for_completion+0xfe/0x160 > transport_generic_free_cmd+0x2e/0x80 [target_core_mod] > srpt_send_done+0x59/0x9f [ib_srpt] > __ib_process_cq+0x4b/0xd0 [ib_core] > ib_cq_poll_work+0x1b/0x60 [ib_core] > process_one_work+0x208/0x6a0 > worker_thread+0x49/0x4a0 > kthread+0x107/0x140 > ret_from_fork+0x2e/0x40 > > Signed-off-by: Bart Van Assche <bart.vanassche@xxxxxxxxxxx> > Cc: <stable@xxxxxxxxxxxxxxx> > --- > drivers/target/target_core_transport.c | 9 +++------ > 1 file changed, 3 insertions(+), 6 deletions(-) > > diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c > index 2ed9721a7202..ab1c493a9a16 100644 > --- a/drivers/target/target_core_transport.c > +++ b/drivers/target/target_core_transport.c > @@ -2504,15 +2504,13 @@ int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks) > int ret = 0; > bool aborted = false, tas = false; > > - if (!(cmd->se_cmd_flags & SCF_SE_LUN_CMD)) { > - if (wait_for_tasks && (cmd->se_cmd_flags & SCF_SCSI_TMR_CDB)) > - target_wait_free_cmd(cmd, &aborted, &tas); > + if (wait_for_tasks) > + target_wait_free_cmd(cmd, &aborted, &tas); > > + if (!(cmd->se_cmd_flags & SCF_SE_LUN_CMD)) { > if (!aborted || tas) > ret = transport_put_cmd(cmd); > } else { > - if (wait_for_tasks) > - target_wait_free_cmd(cmd, &aborted, &tas); > /* > * Handle WRITE failure case where transport_generic_new_cmd() > * has already added se_cmd to state_list, but fabric has > @@ -2535,7 +2533,6 @@ int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks) > */ > if (aborted) { > pr_debug("Detected CMD_T_ABORTED for ITT: %llu\n", cmd->tag); > - wait_for_completion(&cmd->cmd_wait_comp); > cmd->se_tfo->release_cmd(cmd); > ret = 1; > } Removing the aborted wait_for_completion breaks the second order scenario described for iscsi-target previously here: http://www.spinics.net/lists/target-devel/msg14456.html where a concurrent CMD_T_ABORTED occurs while an iscsi connection is shutting down, and transport_generic_free_cmd() must not return until cmd->cmd_kref reaches zero and does the complete. However, looking at transport_generic_free_cmd() in the context of ib_srpt and ib_isert, I don't see how this scenario can ever happen in as-is in upstream code. Namely, all of the transport_generic_free_cmd() callers in ib_srpt and ib_isert pass wait_for_tasks = false. And the only time transport_generic_free_cmd() will block is when wait_for_tasks = true, or when wait_for_tasks = true && abort = true. So as-is in upstream, how can transport_generic_free_cmd() ever block when wait_for_tasks = false..?