RE: [PATCH 3/5] scsi: megaraid_sas - throttle io if FW is busy

"Patro, Sumant" <Sumant.Patro@xxxxxxx> · Thu, 15 Feb 2007 19:53:48 -0700

Hello James,

	I re-submitted the patch yesterday with the "space" issue fixed
(adhering to coding guideline).

	I will check for alternative to calculate the time driver have
been sending host busy to OS. Will check with time_before() as you have
suggested.

	Throttling from megasas_generic_reset() handler did not help.
megaraid does not have feature to abort cmds. So, in the generic reset
routine, the driver just waits for cmd completion by FW. These timed-out
cmds gets retried by mid-layer with "retries" incremented by 1.
Eventually we see retries equals max_allowed followed by SCSI error with
"DRIVER_TIMEOUT".

	By throttling from the megasas_queue_command we do not hit the
issue. In our test with this code, retries did not exceed 2. 

Regards,

Sumant 

-----Original Message-----
From: James Bottomley [mailto:James.Bottomley@xxxxxxxxxxxx] 
Sent: Thursday, February 15, 2007 4:11 PM
To: Patro, Sumant
Cc: akpm@xxxxxxxx; linux-scsi@xxxxxxxxxxxxxxx;
linux-kernel@xxxxxxxxxxxxxxx; Kolli, Neela; Yang, Bo; Patro, Sumant
Subject: Re: [PATCH 3/5] scsi: megaraid_sas - throttle io if FW is busy

On Tue, 2007-02-06 at 14:11 -0800, Sumant Patro wrote:
> Checks added in megasas_queue_command to know if FW is able to process

> commands within timeout period. If number of retries is 2 or greater, 
> the driver stops sending cmd to FW. IO is resumed if pending cmd count

> reduces to 16 or 5 seconds has elapsed from the time cmds were last 
> sent to FW.
> 
> Signed-off-by: Sumant Patro <sumant.patro@xxxxxxx>
> ---
> 
>  drivers/scsi/megaraid/megaraid_sas.c |   27 +++++++++++++++++++++++++
>  drivers/scsi/megaraid/megaraid_sas.h |    3 ++
>  2 files changed, 30 insertions(+)
> 
> diff -uprN 2.6.new-p2/drivers/scsi/megaraid/megaraid_sas.c
2.6.new-p3/drivers/scsi/megaraid/megaraid_sas.c
> --- 2.6.new-p2/drivers/scsi/megaraid/megaraid_sas.c	2007-02-06
08:43:40.000000000 -0800
> +++ 2.6.new-p3/drivers/scsi/megaraid/megaraid_sas.c	2007-02-06
08:50:40.000000000 -0800
> @@ -839,6 +839,7 @@ megasas_queue_command(struct scsi_cmnd *
>  	u32 frame_count;
>  	struct megasas_cmd *cmd;
>  	struct megasas_instance *instance;
> +	unsigned long sec;
>  
>  	instance = (struct megasas_instance *)
>  	    scmd->device->host->hostdata;
> @@ -856,6 +857,23 @@ megasas_queue_command(struct scsi_cmnd *
>  		goto out_done;
>  	}
>  
> +	/* Check if we can process cmds */
> +	if(instance->is_busy){
            ^                  ^
       space needed per linux coding style (and the rest of the file

> +		sec = (jiffies - instance->last_time) / HZ;

please don't do this.  You want to be using time_before() and
jiffies_to_msecs().  The space problems apply to the rest of the code

> +		if(sec<5)
> +			return SCSI_MLQUEUE_HOST_BUSY;
> +		else{
> +			instance->is_busy=0;
> +			instance->last_time=0;
> +		}
> +	}
> +
> +	if(scmd->retries>1){

I really don't think this is a good indicator of your firmware
necessarily having problems; I really think you might want to look at
other indicators ... jiffies_at_alloc might be better, or even
throttling from the abort handler, which must have been called before
you get to here if the command is actually timing out.

Timeout and abort has it's own throttle anyway, since we quiesce the
host before beginning error recovery ... are you sure this scheme
actually solves anything for your device?

James

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html