On 3/26/21 7:31 AM, Maurizio Lombardi wrote: > > > Dne 23. 03. 21 v 3:58 Lv Yunlong napsal(a): >> In transport_free_session, se_nacl is got from se_sess >> with the initial reference. If se_nacl->acl_sess_list is >> empty, se_nacl->dynamic_stop is set to true. Then the first >> target_put_nacl(se_nacl) will drop the initial reference >> and free se_nacl. Later there is a second target_put_nacl() >> to put se_nacl. It may cause error in race. >> >> My patch sets se_nacl->dynamic_stop to false to avoid the >> double put. >> >> Signed-off-by: Lv Yunlong <lyl2019@xxxxxxxxxxxxxxxx> >> --- >> drivers/target/target_core_transport.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c >> index 5ecb9f18a53d..c266defe694f 100644 >> --- a/drivers/target/target_core_transport.c >> +++ b/drivers/target/target_core_transport.c >> @@ -584,8 +584,10 @@ void transport_free_session(struct se_session *se_sess) >> } >> mutex_unlock(&se_tpg->acl_node_mutex); >> >> - if (se_nacl->dynamic_stop) >> + if (se_nacl->dynamic_stop) { >> target_put_nacl(se_nacl); >> + se_nacl->dynamic_stop = false; >> + } >> >> target_put_nacl(se_nacl); >> } >> > > FYI, > > I have received a bug report against the 5.8 kernel about task hangs that seems to involve the nacl "dynamic_stop" code > > Mar 4 16:49:44 gzboot kernel: [186322.177819] INFO: task targetcli:2359053 blocked for more than 120 seconds. > Mar 4 16:49:44 gzboot kernel: [186322.178862] Tainted: P O 5.8.0-44-generic #50-Ubuntu > Mar 4 16:49:44 gzboot kernel: [186322.179746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Mar 4 16:49:44 gzboot kernel: [186322.180583] targetcli D 0 2359053 2359052 0x00000000 > Mar 4 16:49:44 gzboot kernel: [186322.180586] Call Trace: > Mar 4 16:49:44 gzboot kernel: [186322.180592] __schedule+0x212/0x5d0 > Mar 4 16:49:44 gzboot kernel: [186322.180595] ? usleep_range+0x90/0x90 > Mar 4 16:49:44 gzboot kernel: [186322.180596] schedule+0x55/0xc0 > Mar 4 16:49:44 gzboot kernel: [186322.180597] schedule_timeout+0x10f/0x160 > Mar 4 16:49:44 gzboot kernel: [186322.180601] ? evict+0x14c/0x1b0 > Mar 4 16:49:44 gzboot kernel: [186322.180602] __wait_for_common+0xa8/0x150 > Mar 4 16:49:44 gzboot kernel: [186322.180603] wait_for_completion+0x24/0x30 > Mar 4 16:49:44 gzboot kernel: [186322.180637] core_tpg_del_initiator_node_acl+0x8e/0x120 [target_core_mod] We would need more details. We can hit that normally when the target driver waits for stuck IO to complete before calling transport_deregister_session. I think if you hit the put bug, then we would have seen the refcount warning in refcount.h fire before the hung task because we do an extra put. Did the user switch to ACLs? Did you see my comment about it looks like there is a bug if the user were to add an acl while dynamic was used for the same initiatorname. In that case, we do not do the put to match the one taken core_tpg_check_initiator_node_acl. In that case we would hit your hang since no one ever does the last put.