Re: [PATCH net-next 8/8] net: phy: marvell10g: select host interface configuration

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 18, 2021 at 02:47:44PM +0000, Russell King (Oracle) wrote:
> On Thu, Nov 18, 2021 at 04:20:39PM +0200, Vladimir Oltean wrote:
> > On Thu, Nov 18, 2021 at 01:22:18PM +0000, Russell King (Oracle) wrote:
> > > On Thu, Nov 18, 2021 at 02:03:34PM +0200, Vladimir Oltean wrote:
> > > > On Wed, Nov 17, 2021 at 11:50:50PM +0100, Marek Behún wrote:
> > > > > +static int mv3310_select_mactype(unsigned long *interfaces)
> > > > > +{
> > > > > +	if (test_bit(PHY_INTERFACE_MODE_USXGMII, interfaces))
> > > > > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_USXGMII;
> > > > > +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces) &&
> > > > > +		 test_bit(PHY_INTERFACE_MODE_10GBASER, interfaces))
> > > > > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER;
> > > > > +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces) &&
> > > > > +		 test_bit(PHY_INTERFACE_MODE_RXAUI, interfaces))
> > > > > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_RXAUI;
> > > > > +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces) &&
> > > > > +		 test_bit(PHY_INTERFACE_MODE_XAUI, interfaces))
> > > > > +		return MV_V2_3310_PORT_CTRL_MACTYPE_XAUI;
> > > > > +	else if (test_bit(PHY_INTERFACE_MODE_10GBASER, interfaces))
> > > > > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_RATE_MATCH;
> > > > > +	else if (test_bit(PHY_INTERFACE_MODE_RXAUI, interfaces))
> > > > > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_RXAUI_RATE_MATCH;
> > > > > +	else if (test_bit(PHY_INTERFACE_MODE_XAUI, interfaces))
> > > > > +		return MV_V2_3310_PORT_CTRL_MACTYPE_XAUI_RATE_MATCH;
> > > > > +	else if (test_bit(PHY_INTERFACE_MODE_SGMII, interfaces))
> > > > > +		return MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER;
> > > > > +	else
> > > > > +		return -1;
> > > > > +}
> > > > > +
> > > > 
> > > > I would like to understand this heuristic better. Both its purpose and
> > > > its implementation.
> > > > 
> > > > It says:
> > > > (a) If the intersection between interface modes supported by the MAC and
> > > >     the PHY contains USXGMII, then use USXGMII as a MACTYPE
> > > > (b) Otherwise, if the intersection contains both 10GBaseR and SGMII, then
> > > >     use 10GBaseR as MACTYPE
> > > > (...)
> > > > (c) Otherwise, if the intersection contains just 10GBaseR (no SGMII), then
> > > >     use 10GBaseR with rate matching as MACTYPE
> > > > (...)
> > > > (d) Otherwise, if the intersection contains just SGMII (no 10GBaseR), then
> > > >     use 10GBaseR as MACTYPE (no rate matching).
> > > 
> > > What is likely confusing you is a misinterpretation of the constant.
> > > MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER actually means the PHY will
> > > choose between 10GBASE-R, 5GBASE-R, 2500BASE-X, and SGMII depending
> > > on the speed negotiated by the media. In this setting, the PHY
> > > dictates which interface mode will be used.
> > > 
> > > I could have named "MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER" as
> > > "MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_5GBASER_2500BASEX_SGMII_AUTONEG_ON".
> > > Similar with "MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_NO_SGMII_AN", which
> > > would be
> > > "MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_5GBASER_2500BASEX_SGMII_AUTONEG_OFF".
> > > And "MV_V2_3310_PORT_CTRL_MACTYPE_XAUI" would be
> > > "MV_V2_3310_PORT_CTRL_MACTYPE_XAUI_5GBASER_2500BASEX_SGMII_AUTONEG_ON".
> > > 
> > > Clearly using such long identifiers would have been rediculous,
> > > especially the second one at 74 characters.
> > 
> > True, but at least there could be a comment above each definition.
> > There's no size limit to that.
> > 
> > > > First of all, what is MACTYPE exactly? And what is the purpose of
> > > > changing it? What would happen if this configuration remained fixed, as
> > > > it were?
> > > 
> > > The PHY defines the MAC interface mode depending on the MACTYPE
> > > setting selected and the results of the media side negotiation.
> > > 
> > > I think the above answers your remaining questions.
> > 
> > Ok, so going back to case (d). You said that the full name would be
> > MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_5GBASER_2500BASEX_SGMII_AUTONEG_ON.
> > This means that when the only interface mode supported by the host would
> > be SGMII, the PHY's MACTYPE is still configured to use 2500basex,
> > 5gbaser, 10gbaser for the higher link speeds. Clearly this won't work.
> > But on the other hand, the phylink validate method will remove
> > 2500baseT, 5000baseT, 1000baseT from the advertising mask of the PHY, so
> > the system will never end up operating at those speeds, so it should be fine.
> 
> I think you mean 10000baseT rather than 1000baseT. With that correction,
> you are then correct - with the media restricted to 1G or slower speeds,
> and the phy in MACTYPE mode 4 (aka 10GBASE-R as the fastest interface
> mode) it will permanently be talking SGMII to the host.

Yes, I missed a zero.

> > The reason why I'm looking at these patches is to see whether they would
> > bring something useful to Aquantia PHYs. These come with firmware on a
> > flash that is customized by Aquantia themselves based on the specifications
> > of a single board. These PHYs have an ability which is very similar to
> > what I'm seeing here, which is to select, for each negotiated link speed
> > on the media side, the SERDES protocol to use on the system side. This
> > is pre-programmed by the firmware, but could be fixed up by the
> > operating system if done carefully.
> > 
> > The way Layerscape boards use Aquantia PHYs is to always select the
> > "rate matching" option and keep the SERDES protocol fixed, just
> > configure the mini MAC inside the PHY to emit PAUSE frames towards the
> > system to keep the data rate under control. We would be using these PHYs
> > with the generic C45 driver, which would be mostly enough except for
> > lack of PHY interrupts, because the firmware already configures
> > everything.
> > 
> > But on the other hand it gets a bit tiring, especially for PHYs on riser
> > cards, to have to change firmware in order to test a different SERDES
> > protocol, so we were experimenting with some changes in the PHY driver
> > that would basically keep the firmware image fixed, and just fix up the
> > configuration it made, and do things like "use 2500base-x for the
> > 2500base-T speed, and sgmii for 1000base-T/100base-TX". The ability for
> > a PHY to work on a board where its firmware image wasn't specifically
> > designed for it comes in handy sometimes.
> 
> You're going to get into problems with this on Layerscape, because
> reconfiguring the Serdes etc is something I've tried to highlight
> as being necessary to NXP since SolidRun started using LX2160A. I
> think there's some slow progress towards that, but it's so slow that
> I've basically given up caring about it on the Honeycomb/Clearfog CX
> boards now.
> 
> All the SFP cages on my Honeycomb have been configured for the most
> useful mode to me - 1000BASE-X/SGMII, and I've given up caring about
> USXGMII/10GBASE-R on those ports.

Speaking of that, do you know of any SFP modules that would use USXGMII?
It doesn't appear to be listed in the spec sheet when looking for that.

> > I see that this patch set basically introduces the phydev->host_interfaces
> > bitmap which is an attempt to find the answer to that question. But when
> > will we know enough about phydev->host_interfaces in order to safely
> > make decisions in the PHY driver based on it? phylink sets it, phylib
> > does not.
> 
> It won't be something phylib could set because phylib doesn't know
> the capabilities of its user - it's information that would need to be
> provided to phylib.

So you're saying it would be in phylib's best interest to not set it at
all, not even to a single bit corresponding to phydev->interface. So PHY
drivers could work out this way whether they should operate in backwards
compatibility mode or they could change MACTYPE at will.

> > And many Aquantia systems use the generic PHY driver, as mentioned.
> > Additionally, there are old device trees at play here, which only define
> > the initial SERDES protocol. Would we be changing the behavior for those,
> > in that we would be configuring the PHY to keep the SERDES protocol
> > fixed whereas it would have dynamically changed before?
> 
> We have the same situation on Macchiatobin. The 88X3310 there defaults
> to MACTYPE mode 4, and we've supported this for years with DT describing
> the interface as 10GBASE-R - because we haven't actually cared very much
> what DT says up to this point for the 88X3310. As I said in my previous
> reply, the 88X3310 effectively dictates what the PHY interface mode will
> be, and that is communicated back through phylib to whoever is using
> phylib.

So what is the full backwards compatibility strategy with old DT blobs?
Is it in this patch set? I didn't notice it.

> > Another question is what to do if there are multiple ways of
> > establishing a system-side link. For example 1000 Mbps can be achieved
> > either through SGMII, or USXGMII with symbol replication, or 2500base-x
> > with flow control, or 10GBaseR with flow control. And I want to test
> > them all. What would I need to do to change the SERDES protocol from one
> > value to the other? Changing the phy-mode array in the device tree would
> > be one option, but that may not always be possible.
> 
> First point to make here is that rate adaption at the PHY is really
> not well supported in Linux, and there is no way to know via phylib if
> a PHY is capable or not of rate adaption.
> 
> Today, if you have a 10GBASE-R link between a PHY doing rate adaption
> and the "MAC", then what you will get from phylib is:
> 
> 	phydev->interface = PHY_INTERFACE_MODE_10GBASER;
> 	phydev->speed = SPEED_1000;	// result of media negotiation
> 	phydev->duplex = DUPLEX_FULL;	// result of media negotiation
> 	phydev->pause = ...;		// result of media negotiation
> 	phydev->asym_pause = ...;	// result of media negotiation
> 
> which will, for the majority of implementations, result in the MAC being
> forced to a 1G speed, possibly with or without pause enabled.
> 
> Due to this, if phylink is being used, the parameters given to
> mac_link_up/pcs_link_up will be the result of the media negotiation, not
> what is required on the actual link.
> 
> You mention "10GBaseR with flow control" but there is another
> possibility that exists in real hardware out there. "10GBaseR without
> flow control" and in that case, the MAC needs to pace its transmission
> for the media speed (which is a good reason why mac_link_up should be
> given the result of the media negotiation so it can do transmission
> pacing.)
> 
> I have a follow-up to the response I gave to Sean Anderson on rate-
> adapting PHYs that I need to finish and send, and it would be better
> to have any discussion on this topic after I've sent that reply and
> follow-up to that reply.

Ok, how would the MAC pace itself to send at a lower data rate, if the
SERDES protocol is 10G and the PHY doesn't send it PAUSE frames back?
At least Layerscape systems can't do this AFAIK.

I can watch for updates on this on the other thread.



[Index of Archives]     [Device Tree Compilter]     [Device Tree Spec]     [Linux Driver Backports]     [Video for Linux]     [Linux USB Devel]     [Linux PCI Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Yosemite Backpacking]


  Powered by Linux