On 2021-01-29 14:06, Can Guo wrote:
On 2021-01-29 11:20, Bart Van Assche wrote:
On 1/27/21 8:16 PM, Can Guo wrote:
ufshcd_compl_tm() looks for all 0 bits in the
REG_UTP_TASK_REQ_DOOR_BELL
and call complete() for each req who has the req->end_io_data set.
There
can be a race condition btw tmc send/compl, because the
req->end_io_data is
set, in __ufshcd_issue_tm_cmd(), without host lock protection, so it
is
possible that when ufshcd_compl_tm() checks the req->end_io_data, it
is set
but the corresponding tag has not been set in
REG_UTP_TASK_REQ_DOOR_BELL.
Thus, ufshcd_tmc_handler() may wrongly complete TMRs which have not
been
sent out. Fix it by protecting req->end_io_data with host lock, and
let
ufshcd_compl_tm() only handle those tm cmds which have been completed
instead of looking for 0 bits in the REG_UTP_TASK_REQ_DOOR_BELL.
I don't know any other block driver that needs locking to protect
races
between submission and completion context. Can the block layer timeout
mechanism be used instead of the mechanism introduced by this patch,
e.g. by using blk_execute_rq_nowait() to submit requests? That would
allow to reuse the existing mechanism in the block layer core to
handle
races between request completion and timeout handling.
This patch is not introducing any new mechanism, it is fixing the
usage of completion (req->end_io_data = c) introduced by commit
69a6c269c097 ("scsi: ufs: Use blk_{get,put}_request() to allocate
and free TMFs"). If you have better idea to get it fixed once for
all, we are glad to take your change to get it fixed asap.
Regards,
Can Guo.
On second thought, actually the 1st fix alone is enough to eliminate the
race condition. Because blk_mq_tagset_busy_iter() only iterates over all
requests which are not in IDLE state, if blk_mq_start_request() is
called
within the protection of host spin lock, ufshcd_compl_tm() shall not run
into the scenario where req->end_io_data is set but
REG_UTP_TASK_REQ_DOOR_BELL
has not been set. What do you think?
Thanks,
Can Guo.
Thanks,
Bart.