Let me send v2 addressing your comments. Thanks, Jaesoo Lee. On Tue, Apr 9, 2019 at 4:45 PM Bart Van Assche <bvanassche@xxxxxxx> wrote: > > On Tue, 2019-04-09 at 16:29 -0700, Jaesoo Lee wrote: > > Let me comment in line. > > > > On Tue, Apr 9, 2019 at 3:14 PM Bart Van Assche <bvanassche@xxxxxxx> wrote: > > > > > > On Tue, 2019-04-09 at 14:53 -0700, Jaesoo Lee wrote: > > > > When SCSI blk-mq is enabled, there is a bug in handling errors in scsi_queue_rq. > > > > Specifically, the bug is not setting result field of scsi_request correctly when > > > > the dispatch of the command has been failed. Since the upper layer code > > > > including the sg_io ioctl expects to receive any error status from result field > > > > of scsi_request, the error is silently ignored and this could cause data > > > > corruptions for some applications. This commit also fixes another bug that the > > > > result field is not initialized when scsi_request is allocated. > > > > > > > > Signed-off-by: Jaesoo Lee <jalee@xxxxxxxxxxxxxxx> > > > > --- > > > > block/scsi_ioctl.c | 1 + > > > > drivers/scsi/scsi_lib.c | 1 + > > > > 2 files changed, 2 insertions(+) > > > > > > > > diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c > > > > index 533f4ae..f2d7979 100644 > > > > --- a/block/scsi_ioctl.c > > > > +++ b/block/scsi_ioctl.c > > > > @@ -723,6 +723,7 @@ void scsi_req_init(struct scsi_request *req) > > > > req->cmd = req->__cmd; > > > > req->cmd_len = BLK_MAX_CDB; > > > > req->sense_len = 0; > > > > + req->result = 0; > > > > } > > > > EXPORT_SYMBOL(scsi_req_init); > > > > > > What makes you think that this assignment is necessary? > > > > > > > Actually, I discovered this before fixing this bug and we might not > > see this problem anymore once this bug is fixed. > > > > Previously, since we are not setting scsi_req(req)->result in > > scsi_queue_rq, I found that the application could receive another > > DID_TRANSPORT_DISRUPTED host_status again if the same 'struct request' > > is allocated for the IO. > > > > Please let me know if I need to remove this change. > > Since SCSI LLDs have to set that result variable anyway if a request > completes successfully I'd prefer not to add that assignment. > > > > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > > > > index 2018967..af1488d 100644 > > > > --- a/drivers/scsi/scsi_lib.c > > > > +++ b/drivers/scsi/scsi_lib.c > > > > @@ -1699,6 +1699,7 @@ static blk_status_t scsi_queue_rq(struct > > > > blk_mq_hw_ctx *hctx, > > > > ret = BLK_STS_DEV_RESOURCE; > > > > break; > > > > default: > > > > + scsi_req(req)->result = DID_NO_CONNECT << 16; > > > > /* > > > > * Make sure to release all allocated ressources when > > > > * we hit an error, as we will never see this command > > > > > > What leads you to the conclusion that (ret != BLK_STS_OK && > > > ret != BLK_STS_RESOUCE) means that there is a connectivity issue? > > > > I found this is what we are doing for legacy queue case; I referred to > > scsi_prep_return() and scsi_kill_request() code where we always > > returning DID_NO_CONNECT. > > > > However, I think proper return code handling should be something like: > > > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > > index 2018967..21e516e 100644 > > --- a/drivers/scsi/scsi_lib.c > > +++ b/drivers/scsi/scsi_lib.c > > @@ -1699,6 +1699,10 @@ static blk_status_t scsi_queue_rq(struct > > blk_mq_hw_ctx *hctx, > > ret = BLK_STS_DEV_RESOURCE; > > break; > > default: > > + if (unlikely(!scsi_device_online(sdev))) > > + scsi_req(req)->result = DID_NO_CONNECT << 16; > > + else > > + scsi_req(req)->result = DID_ERROR << 16; > > /* > > * Make sure to release all allocated ressources when > > * we hit an error, as we will never see this command > > The above looks better to me than the original patch. > > Thanks, > > Bart.