On Thu, Nov 18, 2021 at 02:47:44PM +0000, Russell King (Oracle) wrote: > On Thu, Nov 18, 2021 at 04:20:39PM +0200, Vladimir Oltean wrote: > > On Thu, Nov 18, 2021 at 01:22:18PM +0000, Russell King (Oracle) wrote: > > > On Thu, Nov 18, 2021 at 02:03:34PM +0200, Vladimir Oltean wrote: > > > > On Wed, Nov 17, 2021 at 11:50:50PM +0100, Marek Behún wrote: > > > > > +static int mv3310_select_mactype(unsigned long *interfaces) > > > > > +{ > > > > > + if (test_bit(PHY_INTERFACE_MODE_USXGMII, interfaces)) > > > > > + return MV_V2_33X0_PORT_CTRL_MACTYPE_USXGMII; > > > > > + else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces) && > > > > > + test_bit(PHY_INTERFACE_MODE_10GBASER, interfaces)) > > > > > + return MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER; > > > > > + else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces) && > > > > > + test_bit(PHY_INTERFACE_MODE_RXAUI, interfaces)) > > > > > + return MV_V2_33X0_PORT_CTRL_MACTYPE_RXAUI; > > > > > + else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces) && > > > > > + test_bit(PHY_INTERFACE_MODE_XAUI, interfaces)) > > > > > + return MV_V2_3310_PORT_CTRL_MACTYPE_XAUI; > > > > > + else if (test_bit(PHY_INTERFACE_MODE_10GBASER, interfaces)) > > > > > + return MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_RATE_MATCH; > > > > > + else if (test_bit(PHY_INTERFACE_MODE_RXAUI, interfaces)) > > > > > + return MV_V2_33X0_PORT_CTRL_MACTYPE_RXAUI_RATE_MATCH; > > > > > + else if (test_bit(PHY_INTERFACE_MODE_XAUI, interfaces)) > > > > > + return MV_V2_3310_PORT_CTRL_MACTYPE_XAUI_RATE_MATCH; > > > > > + else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces)) > > > > > + return MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER; > > > > > + else > > > > > + return -1; > > > > > +} > > > > > + > > > > > > > > I would like to understand this heuristic better. Both its purpose and > > > > its implementation. > > > > > > > > It says: > > > > (a) If the intersection between interface modes supported by the MAC and > > > > the PHY contains USXGMII, then use USXGMII as a MACTYPE > > > > (b) Otherwise, if the intersection contains both 10GBaseR and SGMII, then > > > > use 10GBaseR as MACTYPE > > > > (...) > > > > (c) Otherwise, if the intersection contains just 10GBaseR (no SGMII), then > > > > use 10GBaseR with rate matching as MACTYPE > > > > (...) > > > > (d) Otherwise, if the intersection contains just SGMII (no 10GBaseR), then > > > > use 10GBaseR as MACTYPE (no rate matching). > > > > > > What is likely confusing you is a misinterpretation of the constant. > > > MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER actually means the PHY will > > > choose between 10GBASE-R, 5GBASE-R, 2500BASE-X, and SGMII depending > > > on the speed negotiated by the media. In this setting, the PHY > > > dictates which interface mode will be used. > > > > > > I could have named "MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER" as > > > "MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_5GBASER_2500BASEX_SGMII_AUTONEG_ON". > > > Similar with "MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_NO_SGMII_AN", which > > > would be > > > "MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_5GBASER_2500BASEX_SGMII_AUTONEG_OFF". > > > And "MV_V2_3310_PORT_CTRL_MACTYPE_XAUI" would be > > > "MV_V2_3310_PORT_CTRL_MACTYPE_XAUI_5GBASER_2500BASEX_SGMII_AUTONEG_ON". > > > > > > Clearly using such long identifiers would have been rediculous, > > > especially the second one at 74 characters. > > > > True, but at least there could be a comment above each definition. > > There's no size limit to that. > > > > > > First of all, what is MACTYPE exactly? And what is the purpose of > > > > changing it? What would happen if this configuration remained fixed, as > > > > it were? > > > > > > The PHY defines the MAC interface mode depending on the MACTYPE > > > setting selected and the results of the media side negotiation. > > > > > > I think the above answers your remaining questions. > > > > Ok, so going back to case (d). You said that the full name would be > > MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_5GBASER_2500BASEX_SGMII_AUTONEG_ON. > > This means that when the only interface mode supported by the host would > > be SGMII, the PHY's MACTYPE is still configured to use 2500basex, > > 5gbaser, 10gbaser for the higher link speeds. Clearly this won't work. > > But on the other hand, the phylink validate method will remove > > 2500baseT, 5000baseT, 1000baseT from the advertising mask of the PHY, so > > the system will never end up operating at those speeds, so it should be fine. > > I think you mean 10000baseT rather than 1000baseT. With that correction, > you are then correct - with the media restricted to 1G or slower speeds, > and the phy in MACTYPE mode 4 (aka 10GBASE-R as the fastest interface > mode) it will permanently be talking SGMII to the host. Yes, I missed a zero. > > The reason why I'm looking at these patches is to see whether they would > > bring something useful to Aquantia PHYs. These come with firmware on a > > flash that is customized by Aquantia themselves based on the specifications > > of a single board. These PHYs have an ability which is very similar to > > what I'm seeing here, which is to select, for each negotiated link speed > > on the media side, the SERDES protocol to use on the system side. This > > is pre-programmed by the firmware, but could be fixed up by the > > operating system if done carefully. > > > > The way Layerscape boards use Aquantia PHYs is to always select the > > "rate matching" option and keep the SERDES protocol fixed, just > > configure the mini MAC inside the PHY to emit PAUSE frames towards the > > system to keep the data rate under control. We would be using these PHYs > > with the generic C45 driver, which would be mostly enough except for > > lack of PHY interrupts, because the firmware already configures > > everything. > > > > But on the other hand it gets a bit tiring, especially for PHYs on riser > > cards, to have to change firmware in order to test a different SERDES > > protocol, so we were experimenting with some changes in the PHY driver > > that would basically keep the firmware image fixed, and just fix up the > > configuration it made, and do things like "use 2500base-x for the > > 2500base-T speed, and sgmii for 1000base-T/100base-TX". The ability for > > a PHY to work on a board where its firmware image wasn't specifically > > designed for it comes in handy sometimes. > > You're going to get into problems with this on Layerscape, because > reconfiguring the Serdes etc is something I've tried to highlight > as being necessary to NXP since SolidRun started using LX2160A. I > think there's some slow progress towards that, but it's so slow that > I've basically given up caring about it on the Honeycomb/Clearfog CX > boards now. > > All the SFP cages on my Honeycomb have been configured for the most > useful mode to me - 1000BASE-X/SGMII, and I've given up caring about > USXGMII/10GBASE-R on those ports. Speaking of that, do you know of any SFP modules that would use USXGMII? It doesn't appear to be listed in the spec sheet when looking for that. > > I see that this patch set basically introduces the phydev->host_interfaces > > bitmap which is an attempt to find the answer to that question. But when > > will we know enough about phydev->host_interfaces in order to safely > > make decisions in the PHY driver based on it? phylink sets it, phylib > > does not. > > It won't be something phylib could set because phylib doesn't know > the capabilities of its user - it's information that would need to be > provided to phylib. So you're saying it would be in phylib's best interest to not set it at all, not even to a single bit corresponding to phydev->interface. So PHY drivers could work out this way whether they should operate in backwards compatibility mode or they could change MACTYPE at will. > > And many Aquantia systems use the generic PHY driver, as mentioned. > > Additionally, there are old device trees at play here, which only define > > the initial SERDES protocol. Would we be changing the behavior for those, > > in that we would be configuring the PHY to keep the SERDES protocol > > fixed whereas it would have dynamically changed before? > > We have the same situation on Macchiatobin. The 88X3310 there defaults > to MACTYPE mode 4, and we've supported this for years with DT describing > the interface as 10GBASE-R - because we haven't actually cared very much > what DT says up to this point for the 88X3310. As I said in my previous > reply, the 88X3310 effectively dictates what the PHY interface mode will > be, and that is communicated back through phylib to whoever is using > phylib. So what is the full backwards compatibility strategy with old DT blobs? Is it in this patch set? I didn't notice it. > > Another question is what to do if there are multiple ways of > > establishing a system-side link. For example 1000 Mbps can be achieved > > either through SGMII, or USXGMII with symbol replication, or 2500base-x > > with flow control, or 10GBaseR with flow control. And I want to test > > them all. What would I need to do to change the SERDES protocol from one > > value to the other? Changing the phy-mode array in the device tree would > > be one option, but that may not always be possible. > > First point to make here is that rate adaption at the PHY is really > not well supported in Linux, and there is no way to know via phylib if > a PHY is capable or not of rate adaption. > > Today, if you have a 10GBASE-R link between a PHY doing rate adaption > and the "MAC", then what you will get from phylib is: > > phydev->interface = PHY_INTERFACE_MODE_10GBASER; > phydev->speed = SPEED_1000; // result of media negotiation > phydev->duplex = DUPLEX_FULL; // result of media negotiation > phydev->pause = ...; // result of media negotiation > phydev->asym_pause = ...; // result of media negotiation > > which will, for the majority of implementations, result in the MAC being > forced to a 1G speed, possibly with or without pause enabled. > > Due to this, if phylink is being used, the parameters given to > mac_link_up/pcs_link_up will be the result of the media negotiation, not > what is required on the actual link. > > You mention "10GBaseR with flow control" but there is another > possibility that exists in real hardware out there. "10GBaseR without > flow control" and in that case, the MAC needs to pace its transmission > for the media speed (which is a good reason why mac_link_up should be > given the result of the media negotiation so it can do transmission > pacing.) > > I have a follow-up to the response I gave to Sean Anderson on rate- > adapting PHYs that I need to finish and send, and it would be better > to have any discussion on this topic after I've sent that reply and > follow-up to that reply. Ok, how would the MAC pace itself to send at a lower data rate, if the SERDES protocol is 10G and the PHY doesn't send it PAUSE frames back? At least Layerscape systems can't do this AFAIK. I can watch for updates on this on the other thread.