On 2023/5/13 23:55, Ming Lei wrote:
Hello Guys,
scsi_device->iorequest_cnt/iodone_cnt/ioerr_cnt are only inc/dec,
and exported to userspace via sysfs in kernel, and not see any kernel
consumers, so the exported counters are only for userspace, just be curious
which/how applications consume them?
These counters are used to checking the disk health status in a certain scenario
for us.
These counters not only adds cost in fast path, but also causes kernel panic,
especially the "atomic_inc(&cmd->device->iorequest_cnt)" in
scsi_queue_rq(), because cmd->device may be freed after returning
from scsi_dispatch_cmd(), which is introduced by:
cfee29ffb45b ("scsi: core: Do not increase scsi_device's iorequest_cnt if dispatch failed")
If there aren't explicit applications which depend on these counters,
I'd suggest to kill the three since all are not in stable ABI. Otherwise
I think we need to revert cfee29ffb45b.
Sorry for introduce this bug.
We would check these counters after iodone_cnt equal to iorequest_cnt. So
cfee29ffb45b is aimed to fix the issue of iorequest_cnt is increased for
multiple times if scsi_dispatch_cmd() failed.
Maybe we should revert cfee29ffb45b and fix the original issue with following
changes:
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 03964b26f3f2..0226c9279cef 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1485,6 +1485,7 @@ static int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
*/
SCSI_LOG_MLQUEUE(3, scmd_printk(KERN_INFO, cmd,
"queuecommand : device blocked\n"));
+ atomic_dec(&cmd->device->iorequest_cnt);
return SCSI_MLQUEUE_DEVICE_BUSY;
}
@@ -1517,6 +1518,7 @@ static int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
trace_scsi_dispatch_cmd_start(cmd);
rtn = host->hostt->queuecommand(host, cmd);
if (rtn) {
+ atomic_dec(&cmd->device->iorequest_cnt);
trace_scsi_dispatch_cmd_error(cmd, rtn);
if (rtn != SCSI_MLQUEUE_DEVICE_BUSY &&
rtn != SCSI_MLQUEUE_TARGET_BUSY)
[1] [linux]$ git grep -n -E "iorequest_cnt|iodone_cnt|ioerr_cnt" ./
Documentation/scsi/st.rst:222:value as iodone_cnt at the device level. The tape statistics only count
drivers/scsi/scsi_error.c:362: atomic_inc(&scmd->device->iodone_cnt);
drivers/scsi/scsi_lib.c:1428: atomic_inc(&cmd->device->iodone_cnt);
drivers/scsi/scsi_lib.c:1430: atomic_inc(&cmd->device->ioerr_cnt);
drivers/scsi/scsi_lib.c:1764: atomic_inc(&cmd->device->iorequest_cnt);
drivers/scsi/scsi_sysfs.c:968:show_sdev_iostat(iorequest_cnt);
drivers/scsi/scsi_sysfs.c:969:show_sdev_iostat(iodone_cnt);
drivers/scsi/scsi_sysfs.c:970:show_sdev_iostat(ioerr_cnt);
drivers/scsi/scsi_sysfs.c:1288: &dev_attr_iorequest_cnt.attr,
drivers/scsi/scsi_sysfs.c:1289: &dev_attr_iodone_cnt.attr,
drivers/scsi/scsi_sysfs.c:1290: &dev_attr_ioerr_cnt.attr,
drivers/scsi/sd.c:3531: atomic_set(&sdkp->device->ioerr_cnt, 0);
include/scsi/scsi_device.h:234: atomic_t iorequest_cnt;
include/scsi/scsi_device.h:235: atomic_t iodone_cnt;
include/scsi/scsi_device.h:236: atomic_t ioerr_cnt;
Thanks,
Ming