RE: [PATCH 27/27] scsi: ratelimit I/O error messages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Hannes Reinecke [mailto:hare@xxxxxxx]
> Sent: Monday, 20 October, 2014 1:53 AM
> To: James Bottomley
> Cc: Christoph Hellwig; linux-scsi@xxxxxxxxxxxxxxx; Elliott, Robert (Server
> Storage); Hannes Reinecke
> Subject: [PATCH 27/27] scsi: ratelimit I/O error messages
> 
> There can be quite a lot of I/O error messages, even on smaller
> machines. So we need to ratelimit them to not overwhelm logging.
> 
> Signed-off-by: Hannes Reinecke <hare@xxxxxxx>
> ---
>  drivers/scsi/scsi_lib.c | 30 ++++++++++++++++++------------
>  1 file changed, 18 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 2df485f..9d248d2 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -22,6 +22,7 @@
>  #include <linux/hardirq.h>
>  #include <linux/scatterlist.h>
>  #include <linux/blk-mq.h>
> +#include <linux/ratelimit.h>
> 
>  #include <scsi/scsi.h>
>  #include <scsi/scsi_cmnd.h>
> @@ -1037,18 +1038,23 @@ void scsi_io_completion(struct scsi_cmnd *cmd,
> unsigned int good_bytes)
>  	switch (action) {
>  	case ACTION_FAIL:
>  		/* Give up and fail the remainder of the request */
> -		if (unlikely(scsi_logging_level))
> -			level = SCSI_LOG_LEVEL(SCSI_LOG_MLQUEUE_SHIFT,
> -					       SCSI_LOG_MLQUEUE_BITS);
> -		/*
> -		 * if logging is enabled the failure will be printed
> -		 * in scsi_log_completion(), so avoid duplicate messages
> -		 */
> -		if (!level && !(req->cmd_flags & REQ_QUIET)) {
> -			scsi_print_result(cmd, NULL, FAILED);
> -			if (driver_byte(result) & DRIVER_SENSE)
> -				scsi_print_sense(cmd);
> -			scsi_print_command(cmd);
> +		if (unlikely(scsi_logging_level)) {

I don't think wrapping all that inside:
	if (unlikely(scsi_logging_level))
is correct.  That means unrelated logging levels (e.g., LLQUEUE)
have an impact on whether this print happens.

> +			static DEFINE_RATELIMIT_STATE(_rs,
> +					DEFAULT_RATELIMIT_INTERVAL,
> +					DEFAULT_RATELIMIT_BURST);
> +			level = SCSI_LOG_LEVEL(SCSI_LOG_MLCOMPLETE_SHIFT,
> +					       SCSI_LOG_MLCOMPLETE_BITS);
> +			/*
> +			 * if logging is enabled the failure will be printed
> +			 * in scsi_log_completion(), so avoid duplicate messages
> +			 */
> +			if (__ratelimit(&_rs) && !level &&
> +			    !(req->cmd_flags & REQ_QUIET)) {

__ratelimit should only be invoked if the other two are 
leading towards printing, since it is fairly involved
(including a spinlock):
		if (!level && !(req->cmd_flags & REQ_QUIET) && 
			(__ratelimit(&_rs)) {

> +				scsi_print_result(cmd, NULL, FAILED);
> +				if (driver_byte(result) & DRIVER_SENSE)
> +					scsi_print_sense(cmd);
> +				scsi_print_command(cmd);
> +			}
>  		}
>  		if (!scsi_end_request(req, error, blk_rq_err_bytes(req), 0))
>  			return;

Here are some results from removing devices during sequential reads.

With the scsi logging level clear, just 10 block layer prints show up
(that outer "if" blocks everything):
[48973.034156] blk_update_request: critical target error, dev sdr, sector 7447280
[48973.035298] blk_update_request: critical target error, dev sdr, sector 7591344
[48973.035302] blk_update_request: critical target error, dev sdr, sector 7591352
[48973.035351] blk_update_request: critical target error, dev sdr, sector 7591360
[48973.035355] blk_update_request: critical target error, dev sdr, sector 7591368
[48973.035391] blk_update_request: critical target error, dev sdr, sector 7591376
[48973.035394] blk_update_request: critical target error, dev sdr, sector 7591384
[48973.035436] blk_update_request: critical target error, dev sdr, sector 7591392
[48973.035439] blk_update_request: critical target error, dev sdr, sector 7591400
[48973.035471] blk_update_request: critical target error, dev sdr, sector 7591408

With scsi_logging_level --set --error=5, there are: 
* 10 "FAILED Result" prints
  * the first has no CDB
  * at the end, there is a much later CDB print not preceded by a FAILED print
* 10 "blk_update_request: critical target error" prints
  * their sector number matches the preceding CDB print
* 1 "blk_update_request: 539 callbacks suppressed" print
* fio printed 33 io_u errors
  * the first at offset 27118616, matching the first print
  * one of the later ones is at offset 28297168, matching the last print
* no "callbacks suppressed" messages from scsi_io_completion


[49190.331376] sd 2:0:0:1: [sds] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[49190.334394] sd 2:0:0:1: [sds] Sense Key : Hardware Error [current]
[49190.336516] sd 2:0:0:1: [sds] Add. Sense: Logical unit failure
[49190.337228] sd 2:0:0:1: [sds] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[49190.337231] sd 2:0:0:1: [sds] Sense Key : Hardware Error [current]
[49190.337234] sd 2:0:0:1: [sds] Add. Sense: Logical unit failure
[49190.337236] sd 2:0:0:1: [sds] CDB:
[49190.337241] Read(10): 28 00 01 9d cc 18 00 00 08 00
[49190.337243] blk_update_request: 539 callbacks suppressed
[49190.337245] blk_update_request: critical target error, dev sds, sector 27118616	 <0x019dcc18>
[49190.337252] sd 2:0:0:1: [sds] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[49190.337254] sd 2:0:0:1: [sds] Sense Key : Hardware Error [current]
[49190.337256] sd 2:0:0:1: [sds] Add. Sense: Logical unit failure
[49190.337258] sd 2:0:0:1: [sds] CDB:
[49190.337262] Read(10): 28 00 01 9d cc 20 00 00 08 00
[49190.337264] blk_update_request: critical target error, dev sds, sector 27118624
[49190.337268] sd 2:0:0:1: [sds] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[49190.337270] sd 2:0:0:1: [sds] Sense Key : Hardware Error [current]
[49190.337272] sd 2:0:0:1: [sds] Add. Sense: Logical unit failure
[49190.337274] sd 2:0:0:1: [sds] CDB:
[49190.337278] Read(10): 28 00 01 9d cc 28 00 00 08 00
[49190.337279] blk_update_request: critical target error, dev sds, sector 27118632
[49190.337284] sd 2:0:0:1: [sds] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[49190.337286] sd 2:0:0:1: [sds] Sense Key : Hardware Error [current]
[49190.337288] sd 2:0:0:1: [sds] Add. Sense: Logical unit failure
[49190.337289] sd 2:0:0:1: [sds] CDB:
[49190.337294] Read(10): 28 00 01 9d cc 30 00 00 08 00
[49190.337295] blk_update_request: critical target error, dev sds, sector 27118640
[49190.337300] sd 2:0:0:1: [sds] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[49190.337302] sd 2:0:0:1: [sds] Sense KeyHardware Error [current]
[49190.337304] sd 2:0:0:1: [sds] Add. Sense: Logical unit failure
[49190.337305] sd 2:0:0:1: [sds] CDB:
[49190.337310] Read(10): 28 00 01 9d cc 38 00 00 08 00
[49190.337311] blk_update_request: critical target error, dev sds, sector 27118648
[49190.337315] sd 2:0:0:1: [sds] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[49190.337317] sd 2:0:0:1: [sds] Sense Key : Hardware Error [current]
[49190.337319] sd 2:0:0:1: [sds] Add. Sense: Logical unit failure
[49190.337321] sd 2:0:0:1: [sds] CDB:
[49190.337324] Read(10): 28 00 01 9d cc 40 00 00 08 00
[49190.337326] blk_update_request: critical target error, dev sds, sector 27118656
[49190.337331] sd 2:0:0:1: [sds] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[49190.337333] sd 2:0:0:1: [sds] Sense Key : Hardware Error [current]
[49190.337335] sd 2:0:0:1: [sds] Add. Sense: Logical unit failure
[49190.337336] sd 2:0:0:1: [sds] CDB:
[49190.337340] Read(10): 28 00 01 9d cc 48 00 00 08 00
[49190.337342] blk_update_request: critical target error, dev sds, sector 27118664
[49190.337386] sd 2:0:0:1: [sds] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[49190.337388] sd 2:0:0:1: [sds] Sense Key : Hardware Error [current]
[49190.337390] sd 2:0:0:1: [sds] Add. Sense: Logical unit failure
[49190.337392] sd 2:0:0:1: [sds] CDB:
[49190.337396] Read(10): 28 00 01 9d cc 50 00 00 08 00
[49190.337397] blk_update_request: critical target error, dev sds, sector 27118672
[49190.337402] sd 2:0:0:1: [sds] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[49190.337404] sd 2:0:0:1: [sds] Sense Key : Hardware Error [current]
[49190.337406] sd 2:0:0:1: [sds] Add. Sense: Logical unit failure
[49190.337408] sd 2:0:0:1: [sds] CDB:
[49190.337412] Read(10): 28 00 01 9d cc 58 00 00 08 00
[49190.337413] blk_update_request: critical target error, dev sds, sector 27118680 <0x19dcc58>
[49190.337417] blk_update_request: critical target error, dev sds, sector 27118688 <0x19dcc60>
[49190.554043] sd 2:0:0:1: [sds] CDB:
[49190.555178] Read(10): 28 00 01 af c7 d0 00 00 08 00 <28297168>


Tag printing is not included in this patch series, so you cannot 
tell which lines go together.  The first three lines:
[49190.331376] sd 2:0:0:1: [sds] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[49190.334394] sd 2:0:0:1: [sds] Sense Key : Hardware Error [current]
[49190.336516] sd 2:0:0:1: [sds] Add. Sense: Logical unit failure

probably go with the last two lines:
[49190.554043] sd 2:0:0:1: [sds] CDB:
[49190.555178] Read(10): 28 00 01 af c7 d0 00 00 08 00 <28297168>

with the other 9 prints in between coming from another CPU.


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux