On Mon, 31 Aug 2020, 9:18am, Daniel Wagner wrote: > > It was observed on an ISP8324 16Gb HBA with fw=8.08.203 (d0d5) that > pkt->entry_type was MBX_IOCB_TYPE/0x39 with an sp->type SRB_SCSI_CMD > which is invalid and should not be possible. > > A careful code review of the crash dump didn't reveal any short > comings. Reading the entry_type from the crash dump shows the expected > value of STATUS_TYPE/0x03 but the call trace shows that > qla24xx_mbx_iocb_entry() is used. > > One possible explanation is when pkt->entry_type is read it doesn't > contain the correct information. That means the driver observes an data > race by the firmware. > > Signed-off-by: Daniel Wagner <dwagner@xxxxxxx> > --- > drivers/scsi/qla2xxx/qla_isr.c | 30 ++++++++++++++++++++++++++++-- > 1 file changed, 28 insertions(+), 2 deletions(-) > > diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c > index b787643f5031..22aa4c0b901d 100644 > --- a/drivers/scsi/qla2xxx/qla_isr.c > +++ b/drivers/scsi/qla2xxx/qla_isr.c > @@ -3392,6 +3392,33 @@ void qla24xx_nvme_ls4_iocb(struct scsi_qla_host *vha, > sp->done(sp, comp_status); > } > > +static void qla24xx_process_mbx_iocb_response(struct scsi_qla_host *vha, > + struct rsp_que *rsp, struct sts_entry_24xx *pkt) > +{ > + srb_t *sp; > + > + sp = qla2x00_get_sp_from_handle(vha, rsp->req, pkt); > + if (!sp) > + return; > + > + if (sp->type == SRB_SCSI_CMD || > + sp->type == SRB_NVME_CMD || > + sp->type == SRB_TM_CMD) { > + /* Some firmware version don't update the entry_type > + * correctly. It was observed entry_type contained > + * MBCX_IOCB_TYPE instead of the expected STATUS_TYPE > + * for sp->type SRB_SCSI_CMD, SRB_NVME_CMD or > + * SRB_TM_CMD. > + */ Could you drop the above comment about firmware, as it is speculation at this point? > + ql_log(ql_log_warn, vha, 0x509d, > + "Firmware didn't update entry_type correctly\n"); > + qla2x00_status_entry(vha, rsp, pkt); > + return; It'd be best to take a chip reset path, rather than assuming the packet is good and having the appropriate handler called (hacky). An approach similar to the one done at the beginning of qla2x00_get_sp_from_handle() is what I had in mind. > + } > + > + qla24xx_mbx_iocb_entry(vha, rsp->req, (struct mbx_24xx_entry *)pkt); > +} > + > /** > * qla24xx_process_response_queue() - Process response queue entries. > * @vha: SCSI driver HA context > @@ -3499,8 +3526,7 @@ void qla24xx_process_response_queue(struct scsi_qla_host *vha, > (struct abort_entry_24xx *)pkt); > break; > case MBX_IOCB_TYPE: > - qla24xx_mbx_iocb_entry(vha, rsp->req, > - (struct mbx_24xx_entry *)pkt); > + qla24xx_process_mbx_iocb_response(vha, rsp, pkt); I'd have preferred a common approach across the different IOCB types as an attempt to harden the code, but that will be a little more involved work. This looks ok. Regards, -Arun