> > > > check_cleanup_reqs: > > if (qedi_conn->cmd_cleanup_req > 0) { > > - QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_TID, > > - "Freeing tid=0x%x for cid=0x%x\n", > > - cqe->itid, qedi_conn->iscsi_conn_id); > > - qedi_conn->cmd_cleanup_cmpl++; > > + ++qedi_conn->cmd_cleanup_cmpl; > > + QEDI_INFO(&qedi->dbg_ctx, QEDI_LOG_SCSI_TM, > > + "Freeing tid=0x%x for cid=0x%x cleanup count=%d\n", > > + cqe->itid, qedi_conn->iscsi_conn_id, > > + qedi_conn->cmd_cleanup_cmpl); > > Is the issue that cmd_cleanup_cmpl's increment is not seen by > qedi_cleanup_all_io's wait_event_interruptible_timeout call when it wakes up, > and your patch fixes this by doing a pre increment? > Yes, cmd_cleanup_cmpl's increment is not seen by qedi_cleanup_all_io's wait_event_interruptible_timeout call when it wakes up, even after firmware post all the ISCSI_CQE_TYPE_TASK_CLEANUP events for requested cmd_cleanup_req. Yes, pre increment did addressed this issue. Do you feel otherwise ? > Does doing a pre increment give you barrier like behavior and is that why this > works? I thought if wake_up ends up waking up the other thread it does a barrier > already, so it's not clear to me how changing to a pre-increment helps. > > Is doing a pre-increment a common way to handle this? It looks like we do a > post increment and wake_up* in other places. However, like in the scsi layer we > do wake_up_process and memory-barriers.txt says that always does a general > barrier, so is that why we can do a post increment there? > > Does pre-increment give you barrier like behavior, and is the wake_up call not > waking up the process so we didn't get a barrier from that, and so that's why this > works? > Issue happen before calling wake_up. When we gets a ISCSI_CQE_TYPE_TASK_CLEANUP surge on multiple Rx threads, cmd_cleanup_cmpl tend to miss the increment. The scenario is more similar to multiple threads access cmd_cleanup_cmpl causing race during postfix increment. This could be because of thread reading the same value at a time. Now that I am explaining it, it felt instead of pre-incrementing cmd_cleanup_cmpl, it should be atomic variable. Do see any issue ? >From logs, ------------------------------------------------------- [root@rhel82-leo RHEL90_LOGS]# grep -inr "qedi_iscsi_cleanup_task:2160" conn_err.log | wc -l 99 [root@rhel82-leo RHEL90_LOGS]# grep -inr "qedi_cleanup_all_io:1215" conn_err.log | wc -l 99 [root@rhel82-leo RHEL90_LOGS]# grep -inr "qedi_fp_process_cqes:925" conn_err.log | wc -l 99 [root@rhel82-leo RHEL90_LOGS]# grep -inr "qedi_fp_process_cqes:922" conn_err.log | wc -l 99 [Thu Oct 21 22:03:32 2021] [0000:a5:00.5]:[qedi_cleanup_all_io:1246]:18: i/o cmd_cleanup_req=99, not equal to cmd_cleanup_cmpl=97, cid=0x0 <<< [Thu Oct 21 22:03:38 2021] [0000:a5:00.5]:[qedi_clearsq:1299]:18: fatal error, need hard reset, cid=0x0 -----------------------------------------------------