Re: [RFC PATCH net-next v3 3/8] net: pcs: pcs-mtk-lynxi: add platform driver for MT7988

Frank Wunderlich <frank-w@xxxxxxxxxxxxxxx> · Wed, 10 Jul 2024 13:17:52 +0200

Hi Russel / netdev

how can we proceed here? development for mt7988/Bpi-R4 is stalled here
because it is unclear what's the right way to go...

We want to avoid much work adding a complete new framework for a "small"
change (imho this affects all SFP-like sub-devices and their linking to
phylink/pcs driver)...if this really the way to go then some boundary
conditions will be helpful.

regards Frank

Am 22.04.24 um 18:24 schrieb Daniel Golle:
Hi Russell,
Hi netdev crew,

I haven't received any reply for this email and am still waiting for
clarification regarding how inter-driver dependencies should be modelled
in that case of mtk_eth_soc. My search for good examples for that in the
kernel code was quite frustrating and all I've found are supposedly bugs
of that exact cathegory.

Please see my questions mentioned in the previous email I've sent and
also a summary of suggested solutions inline below:

On Wed, Feb 07, 2024 at 01:29:18AM +0000, Daniel Golle wrote:
Hi Russell,

sorry for the extended time it took me to get back to this patch and
the comments you made for it. Understanding the complete scope of the
problem took a while (plus there were holidays and other fun things),
and also brought up further questions which I hope you can help me
find good answers for, see below:

On Wed, Dec 13, 2023 at 04:04:12PM +0000, Russell King (Oracle) wrote:
On Tue, Dec 12, 2023 at 03:47:18AM +0000, Daniel Golle wrote:
Introduce a proper platform MFD driver for the LynxI (H)SGMII PCS which
is going to initially be used for the MT7988 SoC.

Signed-off-by: Daniel Golle <daniel@xxxxxxxxxxxxxx>
I made some specific suggestions about what I wanted to see for
"getting" PCS in the previous review, and I'm disappointed that this
patch set is still inventing its own solution.

+struct phylink_pcs *mtk_pcs_lynxi_get(struct device *dev, struct device_node *np)
+{
+	struct platform_device *pdev;
+	struct mtk_pcs_lynxi *mpcs;
+
+	if (!np)
+		return NULL;
+
+	if (!of_device_is_available(np))
+		return ERR_PTR(-ENODEV);
+
+	if (!of_match_node(mtk_pcs_lynxi_of_match, np))
+		return ERR_PTR(-EINVAL);
+
+	pdev = of_find_device_by_node(np);
+	if (!pdev || !platform_get_drvdata(pdev)) {
This is racy - as I thought I described before, userspace can unbind
the device in one thread, while another thread is calling this
function. With just the right timing, this check succeeds, but...

+		if (pdev)
+			put_device(&pdev->dev);
+		return ERR_PTR(-EPROBE_DEFER);
+	}
+
+	mpcs = platform_get_drvdata(pdev);
mpcs ends up being read as NULL here. Even if you did manage to get a
valid pointer, "mpcs" being devm-alloced could be freed from under
you at this point...

+	device_link_add(dev, mpcs->dev, DL_FLAG_AUTOREMOVE_CONSUMER);
resulting in this accessing memory which has been freed.

The solution would be either to suppress the bind/unbind attributes
(provided the underlying struct device can't go away, which probably
also means ensuring the same of the MDIO bus. Aternatively, adding
a lock around the remove path and around the checking of
platform_get_drvdata() down to adding the device link would probably
solve it.

However, I come back to my general point - this kind of stuff is
hairy. Do we want N different implementations of it in various drivers
with subtle bugs, or do we want _one_ implemenatation.

If we go with the one implemenation approach, then we need to think
about whether we should be using device links or not. The problem
could be for network interfaces where one struct device is
associated with multiple network interfaces. Using device links has
the unfortunate side effect that if the PCS for one of those network
interfaces is removed, _all_ network interfaces disappear.
I agree, esp. on this MT7988 removal of a PCS which may then not
even be in use also impairs connectivity on the built-in gigE DSA
switch. It would be nice to try to avoid this.

My original suggestion was to hook into phylink to cause that to
take the link down when an in-use PCS gets removed.
I took a deep dive into how this could be done correctly and how
similar things are done for other drivers. Apart from the PCS there
often also is a muxing-PHY involved, eg. MediaTek's XFI T-PHY in this
case (previously often called "pextp" for no apparent reason) or
Marvell's comphy (mvebu-comphy).

So let's try something simple on an already well-supported platform,
I thought and grabbed Marvell Armada CN9131-DB running vanilla Linux,
and it didn't even take some something racy, but plain:

ip link set eth6 up
cd /sys/bus/platform/drivers/mvebu-comphy
echo f2120000.phy > unbind
echo f4120000.phy > unbind
echo f6120000.phy > unbind
ip link set eth6 down


That was enough. The result is a kernel crash, and the same should
apply on popular platforms like the SolidRun MACCHIATOBin and other
similar boards.

So this gets me to think that there is a wider problem around
non-phylink-managed resources which may disappear while in use by
network drivers, and I guess that the same applies in many other
places. I don't have a SATA drive connected to that Marvell board, but
I can imagine what happens when suddenly removing the comphy instance
in charge of the SATA link and then a subsequent SATA hotplug event
happens or stuff like that...

Anyway, back to network subsystem and phylink:

Do you agree that we need a way to register (and unregister) PCS
providers with phylink, so we don't need *_get_by_of_node implementations
in each driver? If so, can you sketch out what the basic requirements
for an acceptable solution would be?

Also, do you agree that lack of handling of disappearing resources,
such as PHYs (*not* network PHYs, but PHYs as in drivers/phy/*) or
syscons, is already a problem right now which should be addressed?

If you imagine phylink to take care of the life-cycle of all link-
resources, what is vision about those things other than classic MDIO-
connected PHYs?

And well, of course, the easy fix for the example above would be:

diff --git a/drivers/phy/marvell/phy-mvebu-cp110-comphy.c b/drivers/phy/marvell/phy-mvebu-cp110-comphy.c
index b0dd133665986..9517c96319595 100644
--- a/drivers/phy/marvell/phy-mvebu-cp110-comphy.c
+++ b/drivers/phy/marvell/phy-mvebu-cp110-comphy.c
@@ -1099,6 +1099,7 @@ static const struct of_device_id mvebu_comphy_of_match_table[] = {
  MODULE_DEVICE_TABLE(of, mvebu_comphy_of_match_table);

  static struct platform_driver mvebu_comphy_driver = {
+	.suppress_bind_attrs = true,
  	.probe	= mvebu_comphy_probe,
  	.driver	= {
  		.name = "mvebu-comphy",

But that should then apply to every single driver in drivers/phy/...

My remaining questions are:
  - Is it ok to just use .suppress_bind_attrs = true for PCS and PHY
    drivers to avoid having to care out hot-removal?
    Those components are anyway built-into the SoC so removing them
    is physically not possible.

  - In case of driver removal (rmmod -f) imho using a device-link would
    be sufficient to prevent the worst. However, we would have to live
    with the all-or-nothing nature of that approach, ie. once e.g. the
    USXGMII driver is being removed, *all* Ethernet interfaces would
    disappear with it (even those not actually using USXGMII).

If the solutions mentioned above do not sound agreeable, please suggest
how a good solution would look like, ideally also share an example of
any driver in the kernel where this is done in the way you would like
to have things done.

+
+	return &mpcs->pcs;
+}
+EXPORT_SYMBOL(mtk_pcs_lynxi_get);
+
+void mtk_pcs_lynxi_put(struct phylink_pcs *pcs)
+{
+	struct mtk_pcs_lynxi *cur, *mpcs = NULL;
+
+	if (!pcs)
+		return;
+
+	mutex_lock(&instance_mutex);
+	list_for_each_entry(cur, &mtk_pcs_lynxi_instances, node)
+		if (pcs == &cur->pcs) {
+			mpcs = cur;
+			break;
+		}
+	mutex_unlock(&instance_mutex);
I don't see what this loop gains us, other than checking that the "pcs"
is still on the list and hasn't already been removed. If that is all
that this is about, then I would suggest:

	bool found = false;

	if (!pcs)
		return;

	mpcs = pcs_to_mtk_pcs_lynxi(pcs);
	mutex_lock(&instance_mutex);
	list_for_each_entry(cur, &mtk_pcs_lynxi_instances, node)
		if (cur == mpcs) {
			found = true;
			break;
		}
	mutex_unlock(&instance_mutex);

	if (WARN_ON(!found))
		return;

which makes it more obvious why this exists.
The idea was not only to make sure it hasn't been removed, but also
to be sure that what ever the private data pointer points to has
actually been created by that very driver.

But yes, doing it in the way you suggest will work in the same way,
just when having my idea in mind it looks more fishy to use
pcs_to_mtk_pcs_lynxi() before we are 100% sure that what we dealing
with has previously been created by this driver.


Cheers


Daniel