> On Fri, Oct 29, 2021 at 11:50:12AM +0900, Daejun Park wrote: > > > On Fri, Oct 29, 2021 at 10:50:15AM +0900, Daejun Park wrote: > > > > > On Thu, Oct 28, 2021 at 07:36:19AM +0900, Daejun Park wrote: > > > > > > This patch addresses the issue of using the wrong API to create a > > > > > > pre_request for HPB READ. > > > > > > HPB READ candidate that require a pre-request will try to allocate a > > > > > > pre-request only during request_timeout_ms (default: 0). Otherwise, it is > > > > > > > > > > Can you explain about 'only during request_timeout_ms'? > > > > > > > > > > From the following code in ufshpb_prep(), the pre-request is allocated > > > > > for each READ IO in case of (!ufshpb_is_legacy(hba) && ufshpb_is_required_wb(hpb, > > > > > transfer_len)). > > > > > > > > > > if (!ufshpb_is_legacy(hba) && > > > > > ufshpb_is_required_wb(hpb, transfer_len)) { > > > > > err = ufshpb_issue_pre_req(hpb, cmd, &read_id); > > > > > > > > > > > passed as normal READ, so deadlock problem can be resolved. > > > > > > > > > > > > Signed-off-by: Daejun Park <daejun7.park@xxxxxxxxxxx> > > > > > > --- > > > > > > drivers/scsi/ufs/ufshpb.c | 11 +++++------ > > > > > > drivers/scsi/ufs/ufshpb.h | 1 + > > > > > > 2 files changed, 6 insertions(+), 6 deletions(-) > > > > > > > > > > > > diff --git a/drivers/scsi/ufs/ufshpb.c b/drivers/scsi/ufs/ufshpb.c > > > > > > index 02fb51ae8b25..3117bd47d762 100644 > > > > > > --- a/drivers/scsi/ufs/ufshpb.c > > > > > > +++ b/drivers/scsi/ufs/ufshpb.c > > > > > > @@ -548,8 +548,7 @@ static int ufshpb_execute_pre_req(struct ufshpb_lu *hpb, struct scsi_cmnd *cmd, > > > > > > read_id); > > > > > > rq->cmd_len = scsi_command_size(rq->cmd); > > > > > > > > > > > > - if (blk_insert_cloned_request(q, req) != BLK_STS_OK) > > > > > > - return -EAGAIN; > > > > > > + blk_execute_rq_nowait(NULL, req, true, ufshpb_pre_req_compl_fn); > > > > > > > > > > Be care with above change, blk_insert_cloned_request() allocates > > > > > driver tag and issues the request to LLD directly, then returns the > > > > > result. If anything fails in the code path, -EAGAIN is returned. > > > > > > > > > > But blk_execute_rq_nowait() simply queued the request in block layer, > > > > > and run hw queue. It doesn't allocate driver tag, and doesn't issue it > > > > > to LLD. > > > > > > > > > > So ufshpb_execute_pre_req() may think the pre-request is issued to LLD > > > > > successfully, but actually not, maybe never. What will happen after the > > > > > READ IO is issued to device, but the pre-request(write buffer) isn't > > > > > sent to device? > > > > > > > > In that case, the HPB READ cannot get benefit from pre-request. But it is not > > > > common case. > > > > > > OK, so the device will ignore the pre-request if it isn't received in > > > time, not sure it is common or not, since blk_execute_rq_nowait() > > > doesn't provide any feedback. Here looks blk_insert_cloned_request() > > > is better. > > > > Yor're right. > > > > > > > > > > > Can you explain how this change solves the deadlock? > > > > > > > > The deadlock is happen when the READ waiting allocation of pre-request. But > > > > the timeout code makes to stop waiting after given time later. > > > > > > If you mean blk-mq timeout code will be triggered, I think it won't. > > > Meantime, LLD may see nothing to timeout too. > > > > I mean timeout of the HPB code. Please refer following code: > > > > if (!ufshpb_is_legacy(hba) && > > ufshpb_is_required_wb(hpb, transfer_len)) { > > err = ufshpb_issue_pre_req(hpb, cmd, &read_id); > > if (err) { > > unsigned long timeout; > > > > timeout = cmd->jiffies_at_alloc + msecs_to_jiffies( > > hpb->params.requeue_timeout_ms); > > > > if (time_before(jiffies, timeout)) > > return -EAGAIN; > > > > hpb->stats.miss_cnt++; > > return 0; > > } > > } > > > > Although the return value of ufshpb_issue_pre_req() is -EAGAIN, the code > > ignores the return value and issues READ not HPB READ. > > OK, got it, this way should avoid the deadlock. But just be curious why > you change hpb->throttle_pre_req to 4, seems it isn't necessary for > avoiding the deadlock? Because blk_execute_rq_nowait calls blk_mq_run_hw_queue, not dispatchs WRITE_BUFFER directly. So, if the next request requires pre-request, it makes the latency of first read longer. Therefore, it prevents this extreme case by limiting number of pre-request. Thanks, Daejun