On Wed, Mar 14, 2012 at 2:33 PM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > On Wed, 2012-03-14 at 17:11 -0400, Alan Stern wrote: >> I see. It's a nasty situation; I guess the best way to describe it is >> a conflict between the requirements of the PM and SCSI subsystems: >> >> The PM core runs suspend and resume operations in async >> threads, and these threads need to acquire the device lock; >> >> The sd driver needs to insure that async probing is finished >> when starting its remove routine, and it is called with the >> device lock held. >> >> The best solution might be to use a workqueue for sd's async probing >> instead of the "async" threads. Then the work routine could be >> cancelled without doing async_synchronize_full(). > > Hmm... I was wondering why we needed a kernel global sync. If this is > just for probe work what about just doing something like the following? > Keep the sync operation local to probe-work and not entangle with the > resume-work? I'll give this a shot when I get back to my test system. > > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > index bd17cf8..ec69e35 100644 > --- a/drivers/scsi/sd.c > +++ b/drivers/scsi/sd.c > @@ -2629,6 +2629,8 @@ static void sd_probe_async(void *data, async_cookie_t cookie) > put_device(&sdkp->dev); > } > > +static LIST_HEAD(sd_probe_domain); > + > /** > * sd_probe - called during driver initialization and whenever a > * new scsi device is attached to the system. It is called once > @@ -2717,7 +2719,7 @@ static int sd_probe(struct device *dev) > dev_set_drvdata(dev, sdkp); > > get_device(&sdkp->dev); /* prevent release before async_schedule */ > - async_schedule(sd_probe_async, sdkp); > + async_schedule_domain(sd_probe_async, sdkp, &sd_probe_domain); > > return 0; > > @@ -2751,7 +2753,7 @@ static int sd_remove(struct device *dev) > sdkp = dev_get_drvdata(dev); > scsi_autopm_get_device(sdkp->device); > > - async_synchronize_full(); > + async_synchronize_full_domain(&sd_probe_domain); > blk_queue_prep_rq(sdkp->device->request_queue, scsi_prep_fn); > blk_queue_unprep_rq(sdkp->device->request_queue, NULL); > device_del(&sdkp->dev); This works fine for me for resolving the deadlock, but I found I also needed the following to fix a spurious: scsi 6:0:1:0: Illegal state transition deleted->running ...in the resume path. @@ -2348,10 +2349,11 @@ EXPORT_SYMBOL(scsi_device_quiesce); * * Must be called with user context, may sleep. */ -void -scsi_device_resume(struct scsi_device *sdev) +void scsi_device_resume(struct scsi_device *sdev) { - if(scsi_device_set_state(sdev, SDEV_RUNNING)) + /* check if the device was deleted during suspend */ + if (sdev->sdev_state == SDEV_DEL || + scsi_device_set_state(sdev, SDEV_RUNNING)) return; scsi_run_queue(sdev->request_queue); } Unless someone can point out something I'm missing I'll go ahead and roll this into it's own patch and rebase/drop the hack out of the libsas resume code. -- Dan -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html