Re: [PATCH RESEND] scsi: megaraid_sas: make module parameter scmd_timeout writable

Lei Chen <lei.chen@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> · Tue, 9 Apr 2024 15:12:03 +0800

Sorry for the non plain text format.

On Mon, Apr 8, 2024 at 8:30 PM John Garry <john.g.garry@xxxxxxxxxx> wrote:
>
> On 08/04/2024 11:05, Lei Chen wrote:
> > When an scmd times out, block layer calls megasas_reset_timer to
> > make further decisions. scmd_timeout indicates when an scmd is really
> > timed-out.
>
> What does really timed-out mean?

scsi_times_out will call eh_timed_out (in megaraid driver, this
indicates megasas_reset_timer),
megasas_reset_timer determines whether a scmd is timed out. If not, it
will return
BLK_EH_RESET_TIMER to tell the block layer to reset the timer and do nothing.
>
>
> > If we want to make this process more fast, we can decrease
> > this value. This patch allows users to change this value in run-time.
> >
> > Signed-off-by: Lei Chen <lei.chen@xxxxxxxxxx>
> > ---
> >   drivers/scsi/megaraid/megaraid_sas_base.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
> > index 3d4f13da1ae8..2a165e5dc7a3 100644
> > --- a/drivers/scsi/megaraid/megaraid_sas_base.c
> > +++ b/drivers/scsi/megaraid/megaraid_sas_base.c
> > @@ -91,7 +91,7 @@ module_param(dual_qdepth_disable, int, 0444);
> >   MODULE_PARM_DESC(dual_qdepth_disable, "Disable dual queue depth feature. Default: 0");
> >
> >   static unsigned int scmd_timeout = MEGASAS_DEFAULT_CMD_TIMEOUT;
> > -module_param(scmd_timeout, int, 0444);
> > +module_param(scmd_timeout, int, 0644);
> >   MODULE_PARM_DESC(scmd_timeout, "scsi command timeout (10-90s), default 90s. See megasas_reset_timer.");
> >
> >   int perf_mode = -1;
>
> I don't know why megaraid_sas has special handling here (and bypasses
> SCSI midlayer).
>
> If the host is overloaded and you get a time-out as a command simply
> could not be handled in time, can you alternatively try reducing the
> scsi device queue depth?

Yeah, scsi layer and drivers already have some methods to control the
queue depth. For megaraid driver,
it will throttle queue depth in megasas_reset_timer. But since scsi
disks on the same megaraid card share
 the queue depth,  that will impact other scsi disks.
In most cases, a scsi disk is more likely to be misworking than a RAID
card, which makes scmd wrong and retry.
We want to adjust scmd_timeout without reloading the driver to make
scmds against abnormal scsi disks completed faster.