On 2019-03-13 6:15 p.m., James Smart wrote:
On 3/13/2019 2:44 PM, Bart Van Assche wrote:
On Wed, 2019-03-13 at 13:19 -0700, James Smart wrote:
I've got an oops for the following stack:
...
lpfc_send_taskmgmt+0x28a [lpfc]
lpfc_bus_reset_handler+0x16a [lpfc]
scsi_try_bus_reset+0x3a
scsi_ioctl_reset+0x143
scsi_ioctl+0x18e [sg]
The issue is that lpfc eventually calls blk_mq_unique_tag() as
everything is scsi-mq enabled. blk_mq_unique_tag() oops as rq->mq_hctx
is NULL. It is NULL as scsi_ioctl_reset built up a dummy request struct
without a queue:
blk_rq_init(NULL, rq);
scmd = (struct scsi_cmnd *)(rq + 1);
scsi_init_command(dev, scmd);
scmd->request = rq;
scmd->cmnd = scsi_req(rq)->cmd;
What's the best way to approach fixing this ?
SCSI LLD bus reset handlers may use the members initialized by scsi_ioctl_reset()
but must not assume that all other struct scsi_cmnd members are valid. Several
years ago a similar crash was fixed in ib_srp.
Bart.
seems rather nuanced to me that scsi_cmnd->request may be non-null, yet elements
of the request may not be initialized.
Looks like the LLD is meant to notice:
shost->tmf_in_progress = 1;
which should only be set in this situation. But that may slow your fastpath.
Or said another way - the driver is to discern cmnds used for reset requests
from cmnds issued through the queuecommand - and *not* perform blk_mq operations
on the reset requests ?
which seems wrong as well - you want every driver to invent it's own way to know
actions are via a reset handler and to special case the cmnd use. Why don't we
fix it at the cause rather than making every driver fix it ?
I agree that scsi_ioctl_reset() should be taught how to produce a request
that doesn't blow up intermediate code expecting all requests to be well
made with respect to mq.
Doug Gilbert
And did we document this in the mid/low api ?
-- james