https://bugzilla.kernel.org/show_bug.cgi?id=212337 --- Comment #5 from Luis Chamberlain (mcgrof@xxxxxxxxxx) --- (In reply to Luis Chamberlain from comment #4) > (In reply to d gilbert from comment #3) > > Of course, commencing device teardown during an async scan is stress > testing > > that code. > > I'm afraid scsi_debug is filled with bug bombs bound to happen, because it > was written without certain consideration of races now coming up as tangible > with syfs / driver load. Namely, if you hold a lock at init and also use it > on sysfs attributes you can easily deadlock. I discovered this issue first > with the zram driver, and fixed the issue with try_module_get()'s on each > driver sysfs attribute, I posted patches for that, for discussion on that > see the post [0] [1], although discussion is mostly on the first patch, the > patch you want to look at is the second one [1]. > > [0] https://lkml.kernel.org/r/20210306022035.11266-2-mcgrof@xxxxxxxxxx > [1] https://lkml.kernel.org/r/20210306022035.11266-3-mcgrof@xxxxxxxxxx > > I considered fixing scsi_debug in light of this, but given that module > initialization is *also* calling helpers used by syfs attributes, *and* this > is also true at module removal, I'm afraid much more care is needed here. In > my patch to zram for the sysfs issue I mention ways to trigger the deadlock, > if you're up for the task to fix that, it would be wonderful. But hey, these > are separate issues. just figured you should be aware. That was a long winded way of saying -- yes stress testing async scan + teardown may be good, but at this point in time *these* other issues I think are of higher priority before one can even consider stress testing async scan + teardown. Oh and also, someone should probably consider if this is required or not, I have a hunch it may, but not sure: commit 48f3c4f354ce113f45cb5adbebe0f1f06f1487f7 Author: Luis Chamberlain <mcgrof@xxxxxxxxxx> Date: Thu Mar 18 15:22:34 2021 +0000 scsi_lib: try module get before running queue Signed-off-by: Luis Chamberlain <mcgrof@xxxxxxxxxx> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index ffe824782647..0af38f8936e4 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -22,6 +22,7 @@ #include <linux/scatterlist.h> #include <linux/blk-mq.h> #include <linux/ratelimit.h> +#include <linux/module.h> #include <asm/unaligned.h> #include <scsi/scsi.h> @@ -1527,6 +1528,12 @@ static int scsi_dispatch_cmd(struct scsi_cmnd *cmd) } + if (!try_module_get(host->hostt->module)) { + cmd->result = (DID_NO_CONNECT << 16); + goto done; + + } + trace_scsi_dispatch_cmd_start(cmd); rtn = host->hostt->queuecommand(host, cmd); if (rtn) { @@ -1538,6 +1545,7 @@ static int scsi_dispatch_cmd(struct scsi_cmnd *cmd) SCSI_LOG_MLQUEUE(3, scmd_printk(KERN_INFO, cmd, "queuecommand : request rejected\n")); } + module_put(host->hostt->module); return rtn; done: -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.