Re: [PATCH] [SCSI] libsas panic when single phy disabled on a wide port

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 3, 2011 at 8:58 AM, Mark Salyzyn
<mark_salyzyn@xxxxxxxxxxxxxx> wrote:
> I am talking to myself ;->. Dan (et al), I inspected deeper and I
> *think* I see the culprit in sas_deform_port().

Thanks Mark, some notes below to help the investigation...

>
> Could the problem be that the code should have been refactored to:
>
> +       BUG_ON(!phy->port);
>        sas_port_delete_phy(phy->port, phy->phy);
> -       if (phy->port->num_phys == 0)
> +       if (phy->port->num_phys == 0) {
>                sas_port_delete(phy->port);
> -       phy->port = NULL;
> +               phy->port = NULL;
> +       }
>
> And also fixed up is sas_ex_discover_end_dev() at out_err (this is on
> soft territory, need to dig into sas_port_delete):
>
>  out_free:
> -       sas_port_delete(phy->port);
> +         if (phy->port->num_phys == 0) {
> +              sas_port_delete(phy->port);
>  out_err:
> -       phy->port = NULL;
> +              phy->port = NULL;
> +         }

I think we are ok in this case because we checked for wide port
membership before entering this routine, so the number of phys could
only ever be 1 at this point, right?

>
> And finally in sas_prot.c at sas_deform_port() (simplified, I still view
> this as split-brained though):
>
>        if (port->num_phys == 1) {
>                if (dev && gone)
>                        dev->gone = 1;
>                sas_unregister_domain_devices(port);
>                sas_port_delete(port->port);
>                port->port = NULL;
> -       } else
> +         } else {
>                sas_port_delete_phy(port->port, phy->phy);
> +                   if (si->dft->lldd_port_deformed)
> +                               si->dfp->lldd_port_deformed(phy);
> +                   return;
> +         }
>
>        if (si->dft->lldd_port_deformed)
>                si->dft->lldd_port_deformed(phy);
>
>        spin_lock_irqsave(&sas_ha->phy_port_lock, flags);
>        spin_lock(&port->phy_list_lock);
>
>        list_del_init(&phy->port_phy_el);
>        phy->port = NULL;
>        port->num_phys--;
>        port->phy_mask &= ~(1U << phy->id);
>
>        if (port->num_phys == 0) {
>                INIT_LIST_HEAD(&port->phy_list);
>                memset(port->sas_addr, 0, SAS_ADDR_SIZE);
>                memset(port->attached_sas_addr, 0, SAS_ADDR_SIZE);
>                port->class = 0;
>                port->iproto = 0;
>                port->tproto = 0;
>                port->oob_mode = 0;
>                port->phy_mask = 0;
>        }
>        spin_unlock(&port->phy_list_lock);
>        spin_unlock_irqrestore(&sas_ha->phy_port_lock, flags);
>
>        return;
>
> The way I see it, sas_deform_port() is the culprit that is clearing out
> phy->port when the num_phys is equal to two, falling through and
> decrementing the count one more time.

That initially looked convincing... but on closer look
sas_port_delete_phy is reducing the number of phys in the transport
class representation of the port (struct sas_port), while the
decrement that sas_deform port is doing is operating the on the
host-local port (struct asd_sas_port).

What is your reproduction scenario for this bug?  We see this in our
larger topology hotplug tests, but I don't think we have a simple
reproducer.

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux