On Fri, Jan 31, 2020 at 03:29:06PM +0100, Andrew Lunn wrote: > > > But by design SFP, SFP+, and QSFP cages are not fixed function network > > > adapters. They are physical and logical devices that can adapt to > > > what is plugged into them. How the devices are exposed should be > > > irrelevant to this conversation it is about the underlying > > > connectivity. > > > > Apologies - I was under the impression that SFP and friends were a > > physical-layer thing and that a MAC in the SoC would still be fixed such > > that its DMA and interrupt configuration could be statically described > > regardless of what transceiver was plugged in (even if some configurations > > might not use every interrupt/stream ID/etc.) If that isn't the case I shall > > go and educate myself further. > > Hi Robin > > It gets interesting with QSFP cages. The Q is quad, there are 4 SERDES > lanes. You can use them for 1x 40G link, or you can split them into 4x > 10G links. So you either need one MAC or 4 MACs connecting to the > cage, and this can change on the fly when a modules is ejected and > replaced with another module. I think it's even more complicated than that. If you have a QSFP+ fiber module, that can be connected to four fibers which can either go to another QSFP+ module, or four separate SFP+ modules. That means it's a manual configuration decision whether to operate the QSFP+ module as a single 40G link, or as four separate 10G links. > There are only one set of control pins > for i2c, loss of signal, TX disable, module inserted. So where the > interrupt/stream ID/etc are mapped needs some flexibility. QSFP changes the way the modules are controlled; gone are many of the hardware signals, replaced by registers in the I2C space. The remaining hardware signals are: ModSelL module select (to enable the I2C bus) ResetL module reset SCL/SDA I2C bus ModPrsL module present IntL interrupt (but not too useful from what I can see!) LPMode low power mode (can be overriden via the I2C bus) > There is also to some degree a conflict with hiding all this inside > firmware. This is complex stuff. It is much better to have one core > implementing in Linux plus some per hardware driver support, than > having X firmware blobs, generally closed source, each with there own > bugs which nobody can fix. QSFP and SFP support is not really part of the DPAA2 firmware. I have some prototype implementation for driving the QSFP+ cage, but I haven't yet worked out how to sensible deal with the "is it 4x 10G or 1x 40G" issue you mention above, and how to interface the QSFP+ driver sensibly with one or four network drivers. I've been concentrating more on the SFP/SFP+ problem on the Honeycomb board which is what most people will have, working out how to sensibly drive the hardware so that our existing SFP support in the kernel can work sensibly. In the last couple of days, I've managed to get something together which works, switching between 1000base-X and SGMII on this hardware, using some of the patches I've already pointed to over the last few weeks. This hardware falls into the "split PCS and MAC" problem space, so it's relevent to many people - and it's important that we don't rush into a solution that works for one implementation and not everyone. This is why I haven't responded to Jose's proposal - I'm still working out what is required for others, but what I can say is that it isn't what Jose has proposed. I had asked Jose to hold off, but he's understandably eager to solve the problem in front of him at the expense of everyone else. What I've found is that any attempt to split the current "phylink_mac_ops" interface between the PCS and MAC blocks results, as I suspected, in mvneta and mvpp2 suffering very badly; the hardware does not split along those functional blocks at all well. My current state of play for this is in my "cex7" branch, pushed out earlier today. It's a bit hacky right now, and there's various issues that need to be solved, but it is functional with the right board boot configuration (basically the DPC file, which is one of the configs for the MC firmware.) I'm planning to look at what's required for the faster speeds; there's other PCS PHYs on this platform that support the other speeds (10G, 25G, 40G, 100G) accessed via Clause 45 cycles. As for the DSA issue you've raised with DSA links, I don't see any obvious solution for that - the whole "if no fixed-link is specified, default to the highest speed" is a real problem; the conversion of DSA to phylink for the CPU and DSA ports did not take account of that. phylink has _zero_ information in that case to know how the link should be configured - there is no PHY, there is no fixed-link specification, there is absolutely nothing. So it's no surprise when phylink tries to configure speed=0 duplex=half pause=off on these interfaces when they're brought up. I notice that this work was contributed by NXP - and in my mind illustrates that they did not think about what they were doing there either. They certainly never ran phylink with debugging on and considered whether the phylink_mac_config() calls contained sensible information. Did they even have all the information necessary to work out what was required - I doubt it very much. Did they realise that the fixed-link specification was optional, did they realise that there could be a PHY on these links, and did they consider what the behaviour would be in those cases? And now we have something of a headache trying to work out how to solve this - one thing is certain, whatever the fix is, it isn't going to be nice to be backported to stable trees. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up