On 12/4/18 2:00 AM, Kashyap Desai wrote:
Problem statement : Whenever try to get outstanding request via scsi_host_find_tag, block layer will return stale entries instead of actual outstanding request. Kernel panic if stale entry is inaccessible or memory is reused. Fix : Undo request mapping in blk_mq_put_driver_tag nce request is return. More detail : Whenever each SDEV entry is created, block layer allocate separate tags and static requestis.Those requests are not valid after SDEV is deleted from the system. On the fly, block layer maps static rqs to rqs as below from blk_mq_get_driver_tag() data.hctx->tags->rqs[rq->tag] = rq; Above mapping is active in-used requests and it is the same mapping which is referred in function scsi_host_find_tag(). After running some IOs, “data.hctx->tags->rqs[rq->tag]” will have some entries which will never be reset in block layer. There would be a kernel panic, If request pointing to “data.hctx->tags->rqs[rq->tag]” is part of “sdev” which is removed and as part of that all the memory allocation of request associated with that sdev might be reused or inaccessible to the driver. Kernel panic snippet - BUG: unable to handle kernel paging request at ffffff8000000010 IP: [<ffffffffc048306c>] mpt3sas_scsih_scsi_lookup_get+0x6c/0xc0 [mpt3sas] PGD aa4414067 PUD 0 Oops: 0000 [#1] SMP Call Trace: [<ffffffffc046f72f>] mpt3sas_get_st_from_smid+0x1f/0x60 [mpt3sas] [<ffffffffc047e125>] scsih_shutdown+0x55/0x100 [mpt3sas] Cc: <stable@xxxxxxxxxxxxxxx> Signed-off-by: Kashyap Desai <kashyap.desai@xxxxxxxxxxxx> Signed-off-by: Sreekanth Reddy <sreekanth.reddy@xxxxxxxxxxxx> --- block/blk-mq.h | 1 + 1 file changed, 1 insertion(+) diff --git a/block/blk-mq.h b/block/blk-mq.h index 9497b47..57432be 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -175,6 +175,7 @@ static inline bool blk_mq_get_dispatch_budget(struct blk_mq_hw_ctx *hctx) static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx, struct request *rq) { + hctx->tags->rqs[rq->tag] = NULL; blk_mq_put_tag(hctx, hctx->tags, rq->mq_ctx, rq->tag); rq->tag = -1;
No SCSI driver should call scsi_host_find_tag() after a request has finished. The above patch introduces yet another race and hence can't be a proper fix.
Bart.