On 12/18/18 9:15 AM, Bart Van Assche wrote: > On Tue, 2018-12-18 at 12:38 +0530, Kashyap Desai wrote: >> V1 -> V2 >> Added fix in __blk_mq_finish_request around blk_mq_put_tag() for >> non-internal tags >> >> Problem statement : >> Whenever try to get outstanding request via scsi_host_find_tag, >> block layer will return stale entries instead of actual outstanding >> request. Kernel panic if stale entry is inaccessible or memory is reused. >> Fix : >> Undo request mapping in blk_mq_put_driver_tag nce request is return. >> >> More detail : >> Whenever each SDEV entry is created, block layer allocate separate tags >> and static requestis.Those requests are not valid after SDEV is deleted >> from the system. On the fly, block layer maps static rqs to rqs as below >> from blk_mq_get_driver_tag() >> >> data.hctx->tags->rqs[rq->tag] = rq; >> >> Above mapping is active in-used requests and it is the same mapping which >> is referred in function scsi_host_find_tag(). >> After running some IOs, “data.hctx->tags->rqs[rq->tag]” will have some >> entries which will never be reset in block layer. >> >> There would be a kernel panic, If request pointing to >> “data.hctx->tags->rqs[rq->tag]” is part of “sdev” which is removed >> and as part of that all the memory allocation of request associated with >> that sdev might be reused or inaccessible to the driver. >> Kernel panic snippet - >> >> BUG: unable to handle kernel paging request at ffffff8000000010 >> IP: [<ffffffffc048306c>] mpt3sas_scsih_scsi_lookup_get+0x6c/0xc0 [mpt3sas] >> PGD aa4414067 PUD 0 >> Oops: 0000 [#1] SMP >> Call Trace: >> [<ffffffffc046f72f>] mpt3sas_get_st_from_smid+0x1f/0x60 [mpt3sas] >> [<ffffffffc047e125>] scsih_shutdown+0x55/0x100 [mpt3sas] > > Other block drivers (e.g. ib_srp, skd) do not need this to work reliably. > It has been explained to you that the bug that you reported can be fixed > by modifying the mpt3sas driver. So why to fix this by modifying the block > layer? Additionally, what prevents that a race condition occurs between > the block layer clearing hctx->tags->rqs[rq->tag] and scsi_host_find_tag() > reading that same array element? I'm afraid that this is an attempt to > paper over a real problem instead of fixing the root cause. I have to agree with Bart here, I just don't see how the mpt3sas use case is special. The change will paper over the issue in any case. -- Jens Axboe