https://bugzilla.kernel.org/show_bug.cgi?id=212337 --- Comment #6 from Luis Chamberlain (mcgrof@xxxxxxxxxx) --- (In reply to Luis Chamberlain from comment #5) > (In reply to Luis Chamberlain from comment #4) > > (In reply to d gilbert from comment #3) > > > Of course, commencing device teardown during an async scan is stress > > testing > > > that code. > > > > I'm afraid scsi_debug is filled with bug bombs bound to happen, because it > > was written without certain consideration of races now coming up as > tangible > > with syfs / driver load. Namely, if you hold a lock at init and also use it > > on sysfs attributes you can easily deadlock. I discovered this issue first > > with the zram driver, and fixed the issue with try_module_get()'s on each > > driver sysfs attribute, I posted patches for that, for discussion on that > > see the post [0] [1], although discussion is mostly on the first patch, the > > patch you want to look at is the second one [1]. > > > > [0] https://lkml.kernel.org/r/20210306022035.11266-2-mcgrof@xxxxxxxxxx > > [1] https://lkml.kernel.org/r/20210306022035.11266-3-mcgrof@xxxxxxxxxx > > > > I considered fixing scsi_debug in light of this, but given that module > > initialization is *also* calling helpers used by syfs attributes, *and* > this > > is also true at module removal, I'm afraid much more care is needed here. > In > > my patch to zram for the sysfs issue I mention ways to trigger the > deadlock, > > if you're up for the task to fix that, it would be wonderful. But hey, > these > > are separate issues. just figured you should be aware. > > That was a long winded way of saying -- yes stress testing async scan + > teardown may be good, but at this point in time *these* other issues I think > are of higher priority before one can even consider stress testing async > scan + teardown. > > Oh and also, someone should probably consider if this is required or not, I > have a hunch it may, but not sure: > > commit 48f3c4f354ce113f45cb5adbebe0f1f06f1487f7 > Author: Luis Chamberlain <mcgrof@xxxxxxxxxx> > Date: Thu Mar 18 15:22:34 2021 +0000 > > scsi_lib: try module get before running queue > > Signed-off-by: Luis Chamberlain <mcgrof@xxxxxxxxxx> > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index ffe824782647..0af38f8936e4 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -22,6 +22,7 @@ > #include <linux/scatterlist.h> > #include <linux/blk-mq.h> > #include <linux/ratelimit.h> > +#include <linux/module.h> > #include <asm/unaligned.h> > > #include <scsi/scsi.h> > @@ -1527,6 +1528,12 @@ static int scsi_dispatch_cmd(struct scsi_cmnd *cmd) > > } > > + if (!try_module_get(host->hostt->module)) { > + cmd->result = (DID_NO_CONNECT << 16); > + goto done; > + > + } > + > trace_scsi_dispatch_cmd_start(cmd); > rtn = host->hostt->queuecommand(host, cmd); > if (rtn) { > @@ -1538,6 +1545,7 @@ static int scsi_dispatch_cmd(struct scsi_cmnd *cmd) > SCSI_LOG_MLQUEUE(3, scmd_printk(KERN_INFO, cmd, > "queuecommand : request rejected\n")); > } > + module_put(host->hostt->module); > > return rtn; > done: I should explain a bit more here. Refcounting on the driver's queuecommand() is possible for sure, *but* in scsi_debug's case its quite complex given we have 3 ways to queue commands: * right away * work queue * hrtimer Yes, you can refcount at the top, but the code is easier to manage / if the library does it. And if its indeed correct, better it goes there. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.