On Mon, 2012-08-27 at 12:13 -0400, John Drescher wrote: > >> I have bisected it down to the following patch: > >> > >> Bisecting: 0 revisions left to test after this (roughly 0 steps) > >> [10f8d5b86743b33d841a175303e2bf67fd620f42] SCSI: fix hot unplug vs > >> async scan race > >> > >> It appears this patch caused the bad behavior although I have not > >> tested that yet. I am rebuilding the array (takes ~2 hours) from the > >> previous good bisect. > >> > > Confirmed. This patch appears to cause the bug in my test setup. > > [ 339.406778] BUG: soft lockup - CPU#2 stuck for 23s! [kworker/u:8:2202] [..] > [ 339.415268] [<ffffffff8141782a>] scsi_remove_target+0xda/0x1f0 I wonder if we are preventing scsi_device_dev_release_usercontext() from making forward progress? ...the attached patch should confirm this or give more info otherwise. -- Dan
scsi_remove_target: debug softlockup From: Dan Williams <djbw@xxxxxx> dump more info in the case where we get stuck trying to remove a device. --- drivers/scsi/scsi_sysfs.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index 093d4f6..011f8ee 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -1032,8 +1032,11 @@ void scsi_remove_target(struct device *dev) { struct Scsi_Host *shost = dev_to_shost(dev->parent); struct scsi_target *starget, *found; + struct scsi_target *found_log[3]; unsigned long flags; + memset(found_log, 0, sizeof(found_log)); + restart: found = NULL; spin_lock_irqsave(shost->host_lock, flags); @@ -1041,8 +1044,24 @@ void scsi_remove_target(struct device *dev) if (starget->state == STARGET_DEL) continue; if (starget->dev.parent == dev || &starget->dev == dev) { + int i; + found = starget; found->reap_ref++; + for (i = 0; i < ARRAY_SIZE(found_log); i++) + if (!found_log[i]) { + found_log[i] = found; + break; + } else if (found_log[i] == found) { + struct scsi_device *sdev = NULL; + + if (!list_empty(&found->devices)) + sdev = list_entry(found->devices.next, typeof(*sdev), same_target_siblings); + pr_err_once("%s[%d]: reap %d:%d state: %d reap: %d dev_del: %d\n", + __func__, i, found->channel, found->id, + found->state, found->reap_ref, + sdev ? work_busy(&sdev->ew.work) ? 2 : 1 : 0); + } break; } }