On Fri, 2012-04-13 at 16:37 -0700, Dan Williams wrote: > The following crash results from cases where the end_device has been > removed before scsi_sysfs_add_sdev has had a chance to run. > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000098 > IP: [<ffffffff8115e100>] sysfs_create_dir+0x32/0xb6 > ... > Call Trace: > [<ffffffff8125e4a8>] kobject_add_internal+0x120/0x1e3 > [<ffffffff81075149>] ? trace_hardirqs_on+0xd/0xf > [<ffffffff8125e641>] kobject_add_varg+0x41/0x50 > [<ffffffff8125e70b>] kobject_add+0x64/0x66 > [<ffffffff8131122b>] device_add+0x12d/0x63a > [<ffffffff814b65ea>] ? _raw_spin_unlock_irqrestore+0x47/0x56 > [<ffffffff8107de15>] ? module_refcount+0x89/0xa0 > [<ffffffff8132f348>] scsi_sysfs_add_sdev+0x4e/0x28a > [<ffffffff8132dcbb>] do_scan_async+0x9c/0x145 > > ...teach sas_rphy_remove to wait for async scanning to quiesce before > removing the end_device. It seems this is a more general problem [1], > but this patch only addresses sas transport. > > [1]: 23edb6e [SCSI] mpt2sas: Do not set sas_device->starget to NULL from > the slave_destroy callback when all the LUNS have been deleted > > Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> > --- > drivers/scsi/scsi_transport_sas.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/scsi/scsi_transport_sas.c b/drivers/scsi/scsi_transport_sas.c > index f7565fc..47abb90 100644 > --- a/drivers/scsi/scsi_transport_sas.c > +++ b/drivers/scsi/scsi_transport_sas.c > @@ -33,8 +33,9 @@ > #include <linux/bsg.h> > > #include <scsi/scsi.h> > -#include <scsi/scsi_device.h> > #include <scsi/scsi_host.h> > +#include <scsi/scsi_scan.h> > +#include <scsi/scsi_device.h> > #include <scsi/scsi_transport.h> > #include <scsi/scsi_transport_sas.h> > > @@ -1667,6 +1668,9 @@ sas_rphy_remove(struct sas_rphy *rphy) > { > struct device *dev = &rphy->dev; > > + /* prevent device_del() while child device_add() may be in-flight */ > + scsi_complete_async_scans(); > + > switch (rphy->identify.device_type) { This doesn't really fix the problem, it merely narrows the window (we still crash in the much shorter window if another async scan starts after you check for completion). Isn't the fix that will prevent all of this to hold the scan mutex across scsi_remove_device() ... in which case it should probably be part of scsi_remove_device()? James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html