RE: [PATCH v4] qla2xxx: Fix unbound NVME response length

"Elliott, Robert (Servers)" <elliott@xxxxxxx> · Wed, 22 Jan 2020 23:59:07 +0000

> -----Original Message-----
> From: linux-scsi-owner@xxxxxxxxxxxxxxx <linux-scsi-owner@xxxxxxxxxxxxxxx>
> On Behalf Of Himanshu Madhani
> Sent: Tuesday, January 21, 2020 1:27 PM
> Subject: [PATCH v4] qla2xxx: Fix unbound NVME response length
...
> We discovered issue with our newer Gen7 adapter when response length
> happens to be larger than 32 bytes, could result into crash.
...
>  drivers/scsi/qla2xxx/qla_isr.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_isr.c
...
> +		if (unlikely(iocb->u.nvme.rsp_pyld_len >
> +		    sizeof(struct nvme_fc_ersp_iu))) {
> +			WARN_ONCE(1, "Unexpected response payload length %u.\n",
> +			    iocb->u.nvme.rsp_pyld_len);

Do you really need a kernel stack dump for this error, which the WARN
macros create? The problem would be caused by firmware behavior, not
something wrong in the kernel.

If this function runs in interrupt context (based on the filename),
then printing lots of data to the slow serial port can cause soft
lockups and other issues.

> +			ql_log(ql_log_warn, fcport->vha, 0x5100,
> +			    "Unexpected response payload length %u.\n",
> +			    iocb->u.nvme.rsp_pyld_len);
> +			iocb->u.nvme.rsp_pyld_len =
> +			    sizeof(struct nvme_fc_ersp_iu);
> +		}

If the problem is due to some firmware incompatibility and every
response is long, the kernel log will quickly become full of
these messages - per-IO prints are noisy. The handling implies
the driver thinks it's safe to proceed, so there's nothing that
is going to keep the problem from reoccurring. If the handling was
to report a failed IO and shut down the device, then the number
of possible error messages would quickly cease.

Safer approaches would be to print only once and maintain a count
of errors in sysfs, or use ratelimited print functions.