On 16-01-15 04:55 PM, James Bottomley wrote:
On Fri, 2016-01-15 at 07:46 -0800, James Bottomley wrote:
On Thu, 2016-01-14 at 16:46 -0500, Tejun Heo wrote:
SCSI command completion path bumps ioerr_cnt whenever scsi_cmd
->result isn't zero; unfortunately, this means that non-error sense
reporting bumps the counter too. This is pronounced with ATA
passthrough commands because most of them explicitly request the
resulting taskfile to be transported via sense data bumping the
count
unconditionally.
Don't bump the counter if scsi_cmd->result simply indicates that
sense data is available.
Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>
Reported-by: Dave Jones <dsj@xxxxxx>
---
drivers/scsi/scsi_lib.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index fa6b2c4..e90e3f7 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1622,7 +1622,8 @@ static void scsi_softirq_done(struct request
*rq)
INIT_LIST_HEAD(&cmd->eh_entry);
atomic_inc(&cmd->device->iodone_cnt);
- if (cmd->result)
+ if (cmd->result &&
+ cmd->result != ((DRIVER_SENSE << 24) |
SAM_STAT_CHECK_CONDITION))
atomic_inc(&cmd->device->ioerr_cnt);
OK, it makes sense to me that we don't include non-error check
conditions. However, then you shouldn't be checking DRIVER_SENSE.
We still have a few drivers that rely on the error handler to fetch
sense explicitly ... they could eventually return non-error
conditions as well.
Actually, I take this back: if we add your proposal, we never increment
the ioerr_cnt even for sense returns indicating failure. That looks to
be even worse than incrementing it too often.
The other problem is that if we do this for you, we should do the same
for SCSI with BUSY and QUEUE_FULL ... they indicate temporary retry
conditions and shouldn't be treated as errors.
I'll stop looking now before I find any more problems with the
statistics code ... I think it needs a rethink.
SCSI status and sense data is non-trivial to decode. It looks
like someone thought that one-liner would bypass a lot of hard
work. Most of the time sense data indicates an error but not
always. Even worse, it can contain vendor specific codes. If
this statistic is on the fast path then IMO it should be retired
(and any others like it). For backward compatibility set it to
0 once at initialization and document the change. Or you could
have a discouraging kernel config option such as
CONFIG_EXPENSIVE_SCSI_STATISTICS ("nonsensical" is another term
that comes to mind).
Doug Gilbert
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html