On Mon, Oct 3, 2011 at 8:58 AM, Mark Salyzyn <mark_salyzyn@xxxxxxxxxxxxxx> wrote: > I am talking to myself ;->. Dan (et al), I inspected deeper and I > *think* I see the culprit in sas_deform_port(). Thanks Mark, some notes below to help the investigation... > > Could the problem be that the code should have been refactored to: > > + BUG_ON(!phy->port); > sas_port_delete_phy(phy->port, phy->phy); > - if (phy->port->num_phys == 0) > + if (phy->port->num_phys == 0) { > sas_port_delete(phy->port); > - phy->port = NULL; > + phy->port = NULL; > + } > > And also fixed up is sas_ex_discover_end_dev() at out_err (this is on > soft territory, need to dig into sas_port_delete): > > out_free: > - sas_port_delete(phy->port); > + if (phy->port->num_phys == 0) { > + sas_port_delete(phy->port); > out_err: > - phy->port = NULL; > + phy->port = NULL; > + } I think we are ok in this case because we checked for wide port membership before entering this routine, so the number of phys could only ever be 1 at this point, right? > > And finally in sas_prot.c at sas_deform_port() (simplified, I still view > this as split-brained though): > > if (port->num_phys == 1) { > if (dev && gone) > dev->gone = 1; > sas_unregister_domain_devices(port); > sas_port_delete(port->port); > port->port = NULL; > - } else > + } else { > sas_port_delete_phy(port->port, phy->phy); > + if (si->dft->lldd_port_deformed) > + si->dfp->lldd_port_deformed(phy); > + return; > + } > > if (si->dft->lldd_port_deformed) > si->dft->lldd_port_deformed(phy); > > spin_lock_irqsave(&sas_ha->phy_port_lock, flags); > spin_lock(&port->phy_list_lock); > > list_del_init(&phy->port_phy_el); > phy->port = NULL; > port->num_phys--; > port->phy_mask &= ~(1U << phy->id); > > if (port->num_phys == 0) { > INIT_LIST_HEAD(&port->phy_list); > memset(port->sas_addr, 0, SAS_ADDR_SIZE); > memset(port->attached_sas_addr, 0, SAS_ADDR_SIZE); > port->class = 0; > port->iproto = 0; > port->tproto = 0; > port->oob_mode = 0; > port->phy_mask = 0; > } > spin_unlock(&port->phy_list_lock); > spin_unlock_irqrestore(&sas_ha->phy_port_lock, flags); > > return; > > The way I see it, sas_deform_port() is the culprit that is clearing out > phy->port when the num_phys is equal to two, falling through and > decrementing the count one more time. That initially looked convincing... but on closer look sas_port_delete_phy is reducing the number of phys in the transport class representation of the port (struct sas_port), while the decrement that sas_deform port is doing is operating the on the host-local port (struct asd_sas_port). What is your reproduction scenario for this bug? We see this in our larger topology hotplug tests, but I don't think we have a simple reproducer. -- Dan -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html