On Sat, Jul 23, 2022 at 07:26:55PM +0200, Marek Behún wrote: > Does Lynx PCS support 1000base-x with AN? Yes, that would be the intention. > Because if so, it may be possible to somehow hack working AN for > 2500base-x, as I managed it for 88E6393X in the commit I mentioned (by > configuring 1000base-x and then hacking the PHY speed to 2.5x). I would need to try and see. For Lynx, to dynamically change from 1000base-x to 2500base-x essentially means to move the SERDES lane from a PLL that can provide the 1.25 GHz required for 1000base-x to a PLL that can provide the 3.125 GHz required for 2500base-x. The procedure itself doesn't involve resetting the PCS, but to be honest with you, I don't know whether the state of the PCS registers is going to be preserved across the PLL change. Maybe it isn't, but this is entirely masked out by the phylink major reconfig process, I don't know. The alternative to dynamic reconfiguration is to program some bits that instruct the SoC what to do on power-on reset, and these bits include the initial SERDES protocols and PLL assignments too. I only tried to experiment with in-band autoneg in this mode (with the lane being configured for 2.5G out of reset, rather than dynamically switching it to 2.5G). > Anyway, I am now looking at the standards, and it seems that all the X > and R have K variant: 1000base-kx, 2500base-kx, 5gbase-kr and > 10gbase-kr. These modes have mandatory clause 73 autonegotiation. The X in BASE-X stands for 8b/10b coding, the R stands for 64b/66b coding. Whereas the K stands for bacKplane, i.e. the medium (compare this with the T in BASE-T, for twisted pair copper cable). Or with 1000BASE-SX and 1000BASE-LX, the S stands for Short wavelength laser and the L for Long wavelength. What I'm trying to say, the 'X' in BASE-X doesn't stand for anything having to do with fiber, I guess 1000BASE-X is just a generic name for the coding scheme (PCS level) rather than something about the medium (PMD level). The terminology is pretty much a mess. > So either we need to add these as different modes of the > phy_interface_t type, or we need to differentiate whether clause 37 or > clause 73 AN should be used by another property. > > But since 1000base-x supports clause 37 and 1000base-kx clause 73, the > one property that we have, managed="in-band-status" is not enough, if > we keep calling both modes '1000base-x'. > > So maybe we really need to add K variants as separate > PHY_INTERFACE_MODED_ constants. That way we can keep assuming clause 37 > for 2500base-x, and try to implement it for as much drivers as > possible, by hacking it up... Well, for good or bad, 10GBase-KR does have its own phy-mode string, and Sean Anderson is sending a patch to add 1000base-KX now too. https://patchwork.kernel.org/project/netdevbpf/patch/20220719235002.1944800-3-sean.anderson@xxxxxxxx/ (I still don't understand what that has to do with the topic of his series, but anyway) More at the end. > > And I still don't understand this clause 73 AN at all. For example, if > one PHY supports only up to 2.5g speeds, will it complete AN with > another PHY that supports up to 10g speeds, if the second PHY will > (maybe?) try at higher frequency? Define what you mean by "one PHY supports only up to 2.5G speeds". My copy of IEEE 802.3-2018 doesn't list in Table 73–4—Technology Ability Field encoding any signaling mode that is capable of 2.5G, but rather 1000BASE-KX, 10GBASE-KR, 25GBASE-KR and so on. So you'd have to express your question in terms of bits that are actually advertised through the Technology Ability field. Then, clause 73 AN, very much like the clause 28/40 AN of BASE-T (to which it is most directly comparable) has a priority resolution function, meaning that if 2 link partners advertise support for multiple technologies, Table 73–5—Priority Resolution will decide which one of the commonly advertised technologies gets used. Side note: contrast this with flow control, which annoyingly was designed by IEEE to not have a priority resolution, in other words you don't get a graceful falloff of the resolved pause modes depending on what you and the link partner advertised, instead you need to preconfigure both ends if you want to achieve a particular result; this is IMO as useless as not having AN at all. There is of course no guarantee that two backplane link partners will have any technology ability in common, for example one may advertise only 1000Base-KX and the other only 10GBase-KR. In that case, autoneg will complete, but the link will simply not come up. The clause 73 autoneg signaling takes place using a predetermined, low-speed encoding. The medium transitions to the highest negotiated technology, and performs clause 74 link training on that medium, only after both ends agree that clause 73 autoneg has completed. This kind of implies that they will agree on the frequency being used for the data traffic. If you're asking whether 2 backplane devices will advertise 10GBase-KR but one of them supports a data rate of only up to 2.5Gbps over that 10G link, I think this is vendor-dependent and IEEE doesn't say anything about it. For example this is where rate adaptation could come into play, either through flow control, or there could be an extension to clause 73 similar to what Cisco did with USXGMII, where the lane operates at 10GBaud but via symbol replication your data rate can actually be only 2.5Gbps. I'm not aware of real life applications of rate adaptation over backplane links. I hinted earlier that clause 73 autoneg is most directly comparable to BASE-T autoneg (these 2 are even situated at different layers if you look at the IEEE OSI stack pictures, compared to where clause 37 AN is). The problem is that the Linux kernel support for new physical technologies grew organically, and we don't have a structure in place that scales naturally to all the places in which these technologies may appear in the stack. For example we have the phy-mode, and this represents the ... /goes searching for the documentation, I don't want to be making this up/ ... phy-connection-type: description: Specifies interface type between the Ethernet device and a physical layer (PHY) device. There you go, pretty vague. What's the Ethernet device, and what's the PHY device? For example SGMII connects a MAC to a PHY, but to speak SGMII to reach to your PHY, you need another PHY that does the parallel GMII to serial translation for you. So to say that the phy-mode is SGMII, you need to ignore that the MAC has a PHY too. 10GBase-KR is similar in a way, it can be placed at multiple layers, and traditionally, where you put it makes a difference to how we describe it in Linux. Maybe you have a 10GBase-T PHY chip with a backplane host-side PHY, it supports clause 73 declaring the 10GBase-KR technology, then it supports clause 74 link training, the whole shebang. These things exist. How would you describe this? You'd say the phy-mode is "10gbase-kr", according to precedent. Would that be the best thing to do, in the spirit of clause 73? I don't think it would. Essentially what would need to happen as a consequence of this description is that your PCS would essentially populate its Technology Ability with a single bit, corresponding to what you put in phy-mode, because that's how we shoehorned this. Then we'd say what, that managed = "in-band-status" decides whether to bypass clause 73 AN or not? I don't think so. Truth is, a 10G-KR "PCS" (what we mean when we say a PHY integrated into a MAC) is much more similar to a dedicated 10G-KR PHY, to the point that it's indistinguishable (what Linux thinks of a phy_device is actually 2 PHYs back to back, one for the host side and one for the medium side), and it *needs* to be treated by Linux in the same way regardless of where it's placed. You *need* to be able to control the backplane PCS' advertisement, whether to use FEC or not, regardless if it's your medium facing device, or an in-between device. The discussion is much, much bigger than this, but in summary, I think it would be quite short-sighted to expand managed = "in-band-status" for anything related to clause 73, or for much more than what it means right now (the problem is, what _does_ it mean and what _doesn't_ it?). This, plus I think development needs to be driven by someone with real world needs and a sense for what's practical. I am quite well outside of the sphere of 10-gig-and-higher networking, I'm just looking from the peanut gallery, so that won't be me.