Hello Kashyap, On Wed, Dec 15, 2021 at 12:11:13AM +0530, Kashyap Desai wrote: > + John Garry > > > blk-mq can't run allocating driver tag and updating ->rqs[tag] > atomically, > > meantime blk-mq doesn't clear ->rqs[tag] after the driver tag is > released. > > > > So there is chance to iterating over one stale request just after the > tag is > > allocated and before updating ->rqs[tag]. > > > > scsi_host_busy_iter() calls scsi_host_check_in_flight() to count scsi > in-flight > > requests after scsi host is blocked, so no new scsi command can be > marked as > > SCMD_STATE_INFLIGHT. However, driver tag allocation still can be run by > blk- > > mq core. One request is marked as SCMD_STATE_INFLIGHT, but this request > > may have been kept in another slot of ->rqs[], meantime the slot can be > > allocated out but ->rqs[] isn't updated yet. Then this in-flight request > is > > counted twice as SCMD_STATE_INFLIGHT. This way causes trouble in > handling > > scsi error. > > Hi Ming, > > We found similar issue on RHEL8.5 (kernel does not have this patch in > discussion.). Issue reproduced on 5.15 kernel as well. > I understood this commit will fix specific race condition and avoid > reading incorrect host_busy value. > As per commit message - That incorrect host_busy will be just transient. > If we read after some delay, correct host_busy count will be available. > Right ? Yeah, any counter(include atomic counter) works in this way. But here it may be 'permanent' because one stale request pointer may stay in one slot of ->rqs[] for long enough time if this slot isn't reused, meantime the same request can be reallocated in case of real io scheduler. Maybe the commit log should be improved a bit for making it explicit. > > In my case (I am using shared host tag enabled driver), it is not race > condition issue but stale rqs[] entries create permanent incorrect count > of host_busy. > Example - There are two pending IOs. This IOs are timed out. Bitmap of > pending IO is tag#5 (actually belongs to hctx0), tag#10 (actually belongs > to hctx1). Note - This is a shared bit map. > If hctx0 has same address of the request at 5th and 10th index, we will It shouldn't be possible, since ->rqs[] is per-tags. If it is shared bit map, both tag#5 and tag#10 are set, and both shared_tags->rqs[5] & shared_tags->rqs[10] should point to the updated requests(timed out). > count total 2 inflight commands instead of 1 from hctx0 context + From > hctx1 context, we will count 1 inflight command = Total is 3. > Even though we read after some delay, host_busy will be incorrect. We > expect host_busy = 2 but it will return 3. > > This patch fix my issue explained above for shared host-tag case. I am > confused reading the commit message. You may not have intentionally fix > the issue as I explained but indirectly it fixes my issue. Am I correct ? > > What was an issue reported by Luojiaxiang ? I am interested to know if > issue reported by Luojiaxiang had shared host tagset enabled ? https://lore.kernel.org/linux-scsi/fe5cf6c4-ce5e-4a0f-f4ab-5c10539492cb@xxxxxxxxxx/ It should be same with yours, and the original report described & explained the issue in details. thanks, Ming > > Kashyap > > > > > Fixes the issue by not iterating over stale request. > > > > Cc: linux-scsi@xxxxxxxxxxxxxxx > > Cc: "Martin K. Petersen" <martin.petersen@xxxxxxxxxx> > > Reported-by: luojiaxing <luojiaxing@xxxxxxxxxx> > > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> > > --- > > block/blk-mq-tag.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index > > 86f87346232a..ff5caeb82542 100644 > > --- a/block/blk-mq-tag.c > > +++ b/block/blk-mq-tag.c > > @@ -208,7 +208,7 @@ static struct request > > *blk_mq_find_and_get_req(struct blk_mq_tags *tags, > > > > spin_lock_irqsave(&tags->lock, flags); > > rq = tags->rqs[bitnr]; > > - if (!rq || !refcount_inc_not_zero(&rq->ref)) > > + if (!rq || rq->tag != bitnr || !refcount_inc_not_zero(&rq->ref)) > > rq = NULL; > > spin_unlock_irqrestore(&tags->lock, flags); > > return rq; > > -- > > 2.31.1 -- Ming