> -----Original Message----- > From: John Garry <john.g.garry@xxxxxxxxxx> > Sent: Tuesday, May 7, 2024 5:17 PM > To: Li, Eric (Honggang) <Eric.H.Li@xxxxxxxx>; Jason Yan <yanaijie@xxxxxxxxxx>; > james.bottomley@xxxxxxxxxxxxxxxxxxxxx; Martin K . Petersen <martin.petersen@xxxxxxxxxx> > Cc: linux-scsi@xxxxxxxxxxxxxxx > Subject: Re: Issue in sas_ex_discover_dev() for multiple level of SAS expanders in a domain > > > [EXTERNAL EMAIL] > > On 07/05/2024 09:44, Li, Eric (Honggang) wrote: > > Resend this email. > > > >> -----Original Message----- > >> From: Li, Eric (Honggang) > >> Sent: Monday, May 6, 2024 9:50 AM > >> To: John Garry <john.g.garry@xxxxxxxxxx>; Jason Yan > >> <yanaijie@xxxxxxxxxx>; james.bottomley@xxxxxxxxxxxxxxxxxxxxx; Martin > >> K . Petersen <martin.petersen@xxxxxxxxxx> > >> Cc: linux-scsi@xxxxxxxxxxxxxxx > >> Subject: RE: Issue in sas_ex_discover_dev() for multiple level of SAS > >> expanders in a domain > >> > >>> -----Original Message----- > >>> From: John Garry <john.g.garry@xxxxxxxxxx> > >>> Sent: Friday, May 3, 2024 4:33 PM > >>> To: Li, Eric (Honggang) <Eric.H.Li@xxxxxxxx>; Jason Yan > >>> <yanaijie@xxxxxxxxxx>; james.bottomley@xxxxxxxxxxxxxxxxxxxxx; Martin > >>> K . Petersen <martin.petersen@xxxxxxxxxx> > >>> Cc: linux-scsi@xxxxxxxxxxxxxxx > >>> Subject: Re: Issue in sas_ex_discover_dev() for multiple level of > >>> SAS expanders in a domain > >>> > >>> > >>> [EXTERNAL EMAIL] > >>> > >>> On 03/05/2024 04:15, Li, Eric (Honggang) wrote: > >>>> John, > >>>> > >>>> I agree that the call to sas_ex_join_wide_port() is not mandatory. > >>>> In fact, some logic here is similar to that function. We don't need > >>>> to do it again. > >>>> But just updating the phy_state may not be enough. I suppose you > >>>> still need to add that PHY into the corresponding wide port by > >>>> calling > >>>> sas_port_add_phy() and update phy->port. > >>>> Just updating the phy_state may avoid the port disabled in this > >>>> issue, but other missing piece of work may cause other issues. > >>>> > >>> > >>> If you check the commit in which that call to > >>> sas_ex_join_wide_port() was originally added - 19252de - it was > >>> added together with the comment "Due to races, the phy might not get > >>> added to the wide port, so we add the phy to the wide port here". > >>> However the code to set phy_state = > >> PHY_STATE_DISCOVERED already existed before that commit. > >>> > >>> When all that code was removed in a1b6fb947f923, I am just wondering > >>> if we have kept the phy_state = PHY_STATE_DISCOVERED code. > >>> > >>> Anyway, would we really join a wideport with the downstream expander? > >>> I would just assume that we would be creating a new wideport, and a > >>> subsequent scanned phy would be added to it. > >> > >> [Eric: ] I don't think the code removed in a1b6fb947f923 is the right way to fix this issue. > >> It just happened avoiding this issue occurring. > > Sure, I don't particularly like it as a fix either. But first I would like to get your setup working > again to verify that only this needs fixing. Indeed something else may be broken since > a1b6fb947f923. In addition, if we need to backport a fix, I would only like to backport a fix for > real known broken topologies, and not theoretical issues not experienced. > > But what exact setup do you have? I am confused, as you seem to be talking about your > setup being broken, but then other setup may also being broken, but you don't have access > to another setup. What is the other setup? > This issue was reported by our tester. His setup is SAS Controller <--> SAS Expander <--> SAS Expander <--> SAS drives <--> SAS Expander <--> SAS drives ...... When this issue occurred, no SAS drives could be detected. As a workaround, I've already added those codes removed by the commit a1b6fb947f923, and at least we could detect the SAS drives. Since our SAS expanders are self-configured, we don't need to explicitly configure the per-PHY routing tables. So, I am not sure there is any other issue in this workaround as some per-PHY routing tables are not configured. > > >> I think the root cause of this issue is the order of function calls > >> to > >> sas_dev_present_in_domain() and sas_ex_join_wide_port(). > >> My concern here is whether or not we still need to configure routing > >> on the parent SAS expander before calling sas_ex_join_wide_port(). > >> This part of logic is not present previously and I don't have environment to test this. > >> > >> Back to your question, I believe we do need to join a wideport to downstream expander. > >> Changing the phy_state to PHY_STATE_DISCOVERED will skip scanning > >> those PHYs, > > I would have thought that re-adding the code removed in a1b6fb947f923 to set > PHY_STATE_DISCOVERED would only affect the first phy scanned in that wideport. Other > phys scanned would have it set through calls to > sas_ex_join_wide_port() > I don't catch your point. In my understanding, it would affect the rest PHYs in that wide port, not the first one. The first PHY has been scanned/discovered and added to a port (at that time, it is just a narrow port yet). Call to sas_ex_join_wide_port() makes the rest PHYs associated with that existing port (making it become wideport) and set up sysfs between the PHY and port. Set PHY_STATE_DISCOVERED would make the rest PHYs not being scanned/discovered again (as this wide port is already scanned). > > by > >> not calling the function sas_ex_discover_dev() to subsequential PHYs > >>within this port (so this issue wouldn’t hit). But those PHYs are not > >>associated with a SAS port. This may trigger other issues (for > >>example, any change count changed on that PHY, or SAS topology in > >>sysfs, > >> etc.) > >>ok, this can be considered more when I understand exactly what you > >>have > and what else you think may be broken. > > Thanks, > John > >