On Mon, Apr 23, 2012 at 12:41 PM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > On Sun, Apr 22, 2012 at 10:30 AM, James Bottomley > <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote: >> On Fri, 2012-04-13 at 16:37 -0700, Dan Williams wrote: >>> When managing shost->host_eh_scheduled libata assumes that there is a >>> 1:1 shost-to-ata_port relationship. libsas creates a 1:N relationship >>> so it needs to manage host_eh_scheduled cumulatively at the host level. >>> The sched_eh and end_eh port port ops allow libsas to track when domain >>> devices enter/leave the "eh-pending" state under ha->lock (previously >>> named ha->state_lock, but it is no longer just a lock for ha->state >>> changes). >>> >>> Since host_eh_scheduled indicates eh without backing commands pinning >>> the device it can be deallocated at any time. Move the taking of the >>> domain_device reference under the port_lock to guarantee that the >>> ata_port stays around for the duration of eh. >> >>> Cc: Tejun Heo <tj@xxxxxxxxxx> >>> Acked-by: Jacek Danecki <jacek.danecki@xxxxxxxxx> >> >> Could we standardise on Acked-by, please. In my book it means the >> maintainer of a piece of code agrees with the change and lets me take it >> through my tree. I'm aware that not everyone uses this definition, so >> we can use a different standard from my current one, but what does it >> mean in this case? >> >>> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> >>> --- >>> drivers/ata/libata-core.c | 4 ++ >>> drivers/ata/libata-eh.c | 57 ++++++++++++++++++++++++++++------- >>> drivers/scsi/libsas/sas_ata.c | 38 +++++++++++++++++++++-- >>> drivers/scsi/libsas/sas_discover.c | 6 ++-- >>> drivers/scsi/libsas/sas_event.c | 12 ++++--- >>> drivers/scsi/libsas/sas_init.c | 14 ++++----- >>> drivers/scsi/libsas/sas_scsi_host.c | 27 +++++++++++++---- >>> include/linux/libata.h | 4 ++ >>> include/scsi/libsas.h | 4 ++ >>> include/scsi/sas_ata.h | 5 +++ >>> 10 files changed, 134 insertions(+), 37 deletions(-) >> >> This is a pretty big change for rc fixes. None of the other changes in >> the series seem to be dependent on it, what bug is it actually fixing? > > It fixes two bugs (which in hindsight could potentially be split into > two patches). The major one is guarantees about when > host_eh_scheduled is cleared. Prior to this patch when at least one > ata_port in a domain has completed eh the flag for host is cleared. > This patch plus "scsi: fix eh wakeup (scsi_schedule_eh vs > scsi_restart_operations)" fixes up deadlocks (waiting indefinitely for > eh to complete) and cases where eh terminates too early > (host_eh_scheduled cleared with work pending) in our hot plug tests. > > The other bug this uncovered is the fact that libsas makes use of the > port and rphy object after libata has completed it's work, so this > patch also moved the taking of the domain_device reference to be under > spin_lock_irq(&sas_ha->phy_port_lock) and > spin_lock(&port->dev_list_lock). Otherwise, if no commands are > pending for the device libsas can proceed directly to freeing the > domain_device before the requested eh runs. > Ping, will this one be queued to scsi/fixes? All our hotplug testing was done with this patch in place, and it's straightforward to see that libata is mismanaging host_eh_scheduled in the libsas ata_port case. -- Dan -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html