RE: Issue in sas_ex_discover_dev() for multiple level of SAS expanders in a domain

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: John Garry <john.g.garry@xxxxxxxxxx>
> Sent: Tuesday, May 7, 2024 5:17 PM
> To: Li, Eric (Honggang) <Eric.H.Li@xxxxxxxx>; Jason Yan <yanaijie@xxxxxxxxxx>;
> james.bottomley@xxxxxxxxxxxxxxxxxxxxx; Martin K . Petersen <martin.petersen@xxxxxxxxxx>
> Cc: linux-scsi@xxxxxxxxxxxxxxx
> Subject: Re: Issue in sas_ex_discover_dev() for multiple level of SAS expanders in a domain
> 
> 
> [EXTERNAL EMAIL]
> 
> On 07/05/2024 09:44, Li, Eric (Honggang) wrote:
> > Resend this email.
> >
> >> -----Original Message-----
> >> From: Li, Eric (Honggang)
> >> Sent: Monday, May 6, 2024 9:50 AM
> >> To: John Garry <john.g.garry@xxxxxxxxxx>; Jason Yan
> >> <yanaijie@xxxxxxxxxx>; james.bottomley@xxxxxxxxxxxxxxxxxxxxx; Martin
> >> K . Petersen <martin.petersen@xxxxxxxxxx>
> >> Cc: linux-scsi@xxxxxxxxxxxxxxx
> >> Subject: RE: Issue in sas_ex_discover_dev() for multiple level of SAS
> >> expanders in a domain
> >>
> >>> -----Original Message-----
> >>> From: John Garry <john.g.garry@xxxxxxxxxx>
> >>> Sent: Friday, May 3, 2024 4:33 PM
> >>> To: Li, Eric (Honggang) <Eric.H.Li@xxxxxxxx>; Jason Yan
> >>> <yanaijie@xxxxxxxxxx>; james.bottomley@xxxxxxxxxxxxxxxxxxxxx; Martin
> >>> K . Petersen <martin.petersen@xxxxxxxxxx>
> >>> Cc: linux-scsi@xxxxxxxxxxxxxxx
> >>> Subject: Re: Issue in sas_ex_discover_dev() for multiple level of
> >>> SAS expanders in a domain
> >>>
> >>>
> >>> [EXTERNAL EMAIL]
> >>>
> >>> On 03/05/2024 04:15, Li, Eric (Honggang) wrote:
> >>>> John,
> >>>>
> >>>> I agree that the call to sas_ex_join_wide_port() is not mandatory.
> >>>> In fact, some logic here is similar to that function. We don't need
> >>>> to do it again.
> >>>> But just updating the phy_state may not be enough. I suppose you
> >>>> still need to add that PHY into the corresponding wide port by
> >>>> calling
> >>>> sas_port_add_phy() and update phy->port.
> >>>> Just updating the phy_state may avoid the port disabled in this
> >>>> issue, but other missing piece of work may cause other issues.
> >>>>
> >>>
> >>> If you check the commit in which that call to
> >>> sas_ex_join_wide_port() was originally added - 19252de - it was
> >>> added together with the comment "Due to races, the phy might not get
> >>> added to the wide port, so we add the phy to the wide port here".
> >>> However the code to set phy_state =
> >> PHY_STATE_DISCOVERED already existed before that commit.
> >>>
> >>> When all that code was removed in a1b6fb947f923, I am just wondering
> >>> if we have kept the phy_state = PHY_STATE_DISCOVERED code.
> >>>
> >>> Anyway, would we really join a wideport with the downstream expander?
> >>> I would just assume that we would be creating a new wideport, and a
> >>> subsequent scanned phy would be added to it.
> >>
> >> [Eric: ] I don't think the code removed in a1b6fb947f923 is the right way to fix this issue.
> >> It just happened avoiding this issue occurring.
> 
> Sure, I don't particularly like it as a fix either. But first I would like to get your setup working
> again to verify that only this needs fixing. Indeed something else may be broken since
> a1b6fb947f923. In addition, if we need to backport a fix, I would only like to backport a fix for
> real known broken topologies, and not theoretical issues not experienced.
> 
> But what exact setup do you have? I am confused, as you seem to be talking about your
> setup being broken, but then other setup may also being broken, but you don't have access
> to another setup. What is the other setup?
> 

This issue was reported by our tester. His setup is
SAS Controller <--> SAS Expander <--> SAS Expander <--> SAS drives
                                                                   <--> SAS Expander <--> SAS drives
                                                                   ......
When this issue occurred, no SAS drives could be detected.
As a workaround, I've already added those codes removed by the commit a1b6fb947f923, and at least we could detect the SAS drives.

Since our SAS expanders are self-configured, we don't need to explicitly configure the per-PHY routing tables.
So, I am not sure there is any other issue in this workaround as some per-PHY routing tables are not configured.

> 
> >> I think the root cause of this issue is the order of function calls
> >> to
> >> sas_dev_present_in_domain() and sas_ex_join_wide_port().
> >> My concern here is whether or not we still need to configure routing
> >> on the parent SAS expander before calling sas_ex_join_wide_port().
> >> This part of logic is not present previously and I don't have environment to test this.
> >>
> >> Back to your question, I believe we do need to join a wideport to downstream expander.
> >> Changing the phy_state to PHY_STATE_DISCOVERED will skip scanning
> >> those PHYs,
> 
> I would have thought that re-adding the code removed in a1b6fb947f923 to set
> PHY_STATE_DISCOVERED would only affect the first phy scanned in that wideport. Other
> phys scanned would have it set through calls to
> sas_ex_join_wide_port()
> 

I don't catch your point.
In my understanding, it would affect the rest PHYs in that wide port, not the first one.
The first PHY has been scanned/discovered and added to a port (at that time, it is just a narrow port yet).
Call to sas_ex_join_wide_port() makes the rest PHYs associated with that existing port (making it become wideport) and set up sysfs between the PHY and port. Set PHY_STATE_DISCOVERED would make the rest PHYs not being scanned/discovered again (as this wide port is already scanned).

> > by
> >> not calling the function sas_ex_discover_dev() to subsequential PHYs
> >>within this port (so  this issue wouldn’t hit). But those PHYs are not
> >>associated with a SAS port. This may trigger  other issues (for
> >>example, any change count changed on that PHY, or SAS topology in
> >>sysfs,
> >> etc.)
> >>ok, this can be considered more when I understand exactly what you
> >>have
> and what else you think may be broken.
> 
> Thanks,
> John
> 
> 





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux