> -----Original Message----- > From: Bart Van Assche [mailto:bvanassche@xxxxxxx] > Sent: Tuesday, December 4, 2018 10:45 PM > To: Kashyap Desai; linux-block; Jens Axboe; Ming Lei; linux-scsi > Cc: Suganath Prabu Subramani; Sreekanth Reddy; Sathya Prakash Veerichetty > Subject: Re: [PATCH] blk-mq: Set request mapping to NULL in > blk_mq_put_driver_tag > > On Tue, 2018-12-04 at 22:17 +0530, Kashyap Desai wrote: > > + Linux-scsi > > > > > > diff --git a/block/blk-mq.h b/block/blk-mq.h > > > > index 9497b47..57432be 100644 > > > > --- a/block/blk-mq.h > > > > +++ b/block/blk-mq.h > > > > @@ -175,6 +175,7 @@ static inline bool > > > > blk_mq_get_dispatch_budget(struct blk_mq_hw_ctx *hctx) > > > > static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx > *hctx, > > > > struct request *rq) > > > > { > > > > + hctx->tags->rqs[rq->tag] = NULL; > > > > blk_mq_put_tag(hctx, hctx->tags, rq->mq_ctx, rq->tag); > > > > rq->tag = -1; > > > > > > No SCSI driver should call scsi_host_find_tag() after a request has > > > finished. The above patch introduces yet another race and hence can't > > > be > > > a proper fix. > > > > Bart, many scsi drivers use scsi_host_find_tag() to traverse max tag_id > > to > > find out pending IO in firmware. > > One of the use case is - HBA firmware recovery. In case of firmware > > recovery, driver may require to traverse the list and return back > > pending > > scsi command to SML for retry. > > I quickly grep the scsi code and found that snic_scsi, qla4xxx, fnic, > > mpt3sas are using API scsi_host_find_tag for the same purpose. > > > > Without this patch, we hit very basic kernel panic due to page fault. > > This > > is not an issue in non-mq code path. Non-mq path use > > blk_map_queue_find_tag() and that particular API does not provide stale > > requests. > > As I wrote before, your patch doesn't fix the race you described but only > makes the race window smaller. Hi Bart, Let me explain the issue. It is not a race, but very straight issue. Let's say we have one scsi_device /dev/sda and total IO submitted + completed are some number 100. All the 100 IO is *completed*. Now, As part of Firmware recovery, driver tries to find our outstanding IOs using scsi_host_find_tag(). Block layer will return all the 100 commands to the driver but really those 100 commands are not outstanding. This patch will return *actual* outstanding commands. If scsi_device /dev/sda is not removed in OS, driver accessing scmd of those 100 commands are safe memory access. Now consider a case where scsi_device /dev/sda is removed and driver performs firmware recovery. This time driver will crash while accessing scmd (randomly based on memory reused.). Along with this patch, low level driver should make sure that all request queue at block layer is quiesce. If you want an example of how to use > scsi_host_find_tag() properly, have a look at the SRP initiator driver > (drivers/infiniband/ulp/srp). That driver uses scsi_host_find_tag() > without > triggering any NULL pointer dereferences. I am not able to find right context from srp, but I check the srp code and looks like that driver is getting scmd using scsi_host_find_tag() for live command. > The approach used in that driver > also works when having to support HBA firmware recovery. > > Bart.