Re: [PATCH v14 00/15] phy: Add support for Lynx 10G SerDes

Vladimir Oltean <vladimir.oltean@xxxxxxx> · Fri, 11 Aug 2023 19:12:52 +0300

Hi Sean,

On Thu, Aug 10, 2023 at 03:58:36PM -0400, Sean Anderson wrote:
> As explained previously (and noted by yourself below) 1G and 10G RCWs
> have mutally-incompatible clocking requirements. Now that you have
> documented an alternate solution, it is possible to boot up with one RCW
> and switch to another. But without that it was not possible to have one
> board support both RCWs (without e.g. a microcontroller or FPGA to
> configure the clock generator before releasing the processor reset). I
> do not think that the silicon should assert the reset request line if
> the serdes doesn't lock, but it does and it can't really be disabled.
> 
> As it happens, our board is set up so that the reference clocks are
> configured for 10G by default. During this boot, reset request is never
> requested. If we did not have to support software configuration of the
> serdes speed (to support 1G SFPs) we would not have to disconnect reset
> request.
> 
> That said, I have evaluated the various reasons that reset request can
> be asserted, and I have determined that for our product they are not
> necessary. The only major limitation is the lack of a watchdog, but that
> was not a requirement for us. Because of this, using a GPIO for reset is
> sufficient and neatly avoids the issue.

I would like to pause here and highlight the existence of the so-called
XY problem: https://en.wikipedia.org/wiki/XY_problem

| The XY problem is a communication problem encountered in help desk,
| technical support, software engineering, or customer service situations
| where the question is about an end user's attempted solution (Y) rather
| than the root problem itself (X).

You admitted that you needed to solve problem X (software reconfiguration
of the SerDes speed between 1G and 10G), and that has led you to problem
Y (giving the PLLs some refclk frequencies which aren't supported at
power-on reset, and thus, are also not validated by NXP). You've presented
to the Linux mailing lists a driver which solves problem Y, but not X.

I gave you a solution to problem X which doesn't even trigger problem Y.

Furthermore, I gave you a solution to problem Y which is much simpler
than yours. On the 12th of June, in Message-ID: 20230612163353.dwouatvqbuo6h4ea@skbuf,
I explained that if you absolutely insist to use the unsupported PLL
refclks, you can use PBL commands to change the PLL settings (so that
they lock) at power-on reset time. The advantage is that both U-Boot and
Linux will work without having to make any modification to the PLLs,
just treat them as read-only.

As it happens, at NXP we also want to solve problem X in a generic way
(aka we need a procedure that works for all customers, and not just for
your board), and we want to do so in a way that the hardware validation
team agrees with. Thus, the SoC needs to accept the PLL refclks at
power-on reset time. The solution we came up with is the one presented
to you yesterday. It makes your PLL clk driver unnecessary. No one will
come after you if you keep it in your Linux tree and use it, but it is
unnecessary.

> > Nonetheless, below is a functional example of how NXP would recommend
> > you to achieve the desired PLL mapping for any RCW-based SerDes protocol.
> > My testing platform was the LS1046A-QDS with PLL1 at 100 MHz and PLL2 at
> > 156.25 MHz. I believe that this should eliminate the need for a clk
> > driver for the PLLs, and should make your Ethernet lanes usable much
> > earlier than Linux. That being said, our position at NXP is that you
> > don't need a clk driver for the PLLs, and I would like to see the clk
> > portion removed from future patch revisions.
> 
> I have not had any issues with clocking. This is actually one of the
> areas where the reference manual is sufficient to create a working
> driver. Adding flexibility here is very useful, because we can solve
> hardware problems in software. This can reduce e.g. board respins, and
> allow for more interesting clocking solutions (such as allowing clock
> generators which must be configured before use).

I am waiting for someone to come up with a real life use case that
justifies those "more interesting clocking solutions".

Please correct me if you think I am wrong, but as things stand, the
SerDes PLL clk driver is now a solution waiting for a problem. It can
wait for that problem out of tree.

> > There is also the previous observation from Ioana that you should not
> > delete PHY interrupts without finding out why they don't work.
> 
> Well, if you have a better solution, please let me know. The interrupt
> does not work in real hardware.
> 
> I was hampered in my efforts to determine the cause because the interrupt
> passes through an FPGA to which I lack the HDL. So far, I have not seen
> any argument against polling except that we do not understand the
> problem yet. However, I have not seen any other analysis of the problem
> either.

Out of respect for the topic at hand, ask for help in a separate thread,
open an NXP support ticket - do something to split it from the SerDes
work which it is unrelated with. You were told that part of the reason
for the NACK is the fact that you are grouping unrelated things together,
and you did nothing about it.

> > +SRDS_PRTCL_S1=4403
> > +SRDS_PRTCL_S2=21849
> 
> I know it is not typical for NXP RCWs, but your rcw tool supports using
> hex/binary prefixes. Thus, you could rewrite the above lines as
> 
> SRDS_PRTCL_S1=0x1133
> SRDS_PRTCL_S2=0x5559
> 
> IMO this is much easier to read, since it matches the documentation.

Ok, thanks.