Re: patch problem - mv88e6xxx: flush switchdev FDB workqueue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 14, 2022 at 08:21:30AM +0100, gregkh@xxxxxxxxxxxxxxxxxxx wrote:
> On Mon, Mar 14, 2022 at 07:38:50AM +0100, gregkh@xxxxxxxxxxxxxxxxxxx wrote:
> > On Sun, Mar 13, 2022 at 02:10:31PM +0000, Vladimir Oltean wrote:
> > > Hi Daniel,
> > > 
> > > On Sun, Mar 13, 2022 at 03:03:07PM +0100, Daniel Suchy wrote:
> > > > Hello,
> > > > 
> > > > I noticed boot problems on my Turris Omnia (with Marvell 88E6176 switch
> > > > chip) after "net: dsa: mv88e6xxx: flush switchdev FDB workqueue before
> > > > removing VLAN" commit https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2566a89b9e163b2fcd104d6005e0149f197b8a48
> > > > 
> > > > Within logs I catched hung kernel tasks (see below), at least first is
> > > > related to DSA subsystem.
> > > > 
> > > > When I revert this patch, everything works as expected and without any
> > > > issues.
> > > > 
> > > > In my setup, I have few vlans on affected switch (i'm using ifupdown2 v3.0
> > > > with iproute2 5.16 for configuration).
> > > > 
> > > > It seems your this patch introduces some new problem (at least for 5.15
> > > > kernels). I suggest revert this patch.
> > > > 
> > > > - Daniel
> > > 
> > > Oh wow, I'm terribly sorry. Yes, this patch shouldn't have been
> > > backported to kernel 5.15 and below, but I guess I missed the
> > > backport notification email and forgot to tell Greg about this.
> > > Patch "net: dsa: mv88e6xxx: flush switchdev FDB workqueue before
> > > removing VLAN" needs to be immediately reverted from these trees.
> > > 
> > > Greg, to avoid this from happening in the future, would something like
> > > this work? Is this parsed in some way?
> > > 
> > > Depends-on: 0faf890fc519 ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work") # which first appeared in v5.16
> > 
> > The "Fixes:" tag will solve this, please just use that in the future.
> 
> Ah, you did have a fixes tag here, so then use the way to say "you also
> need to add another patch here" by adding the sha to the line for the
> stable tree:
> 	cc: stable@xxxxxxxxxxxxxxx # 0faf890fc519
> 
> So, should I just backport that commit instead?  The "Fixes:" line says
> this needs to be backported to 4.14, which is why I added it to these
> trees.
> 
> thanks,

No, don't backport the dependency, just revert the patch (hence my
question: how can I describe "don't backport beyond commit X"?).

Here, you can apply the revert attached.
From 571ca982c0a53d127b4c92339262c1e896e9c5a1 Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <vladimir.oltean@xxxxxxx>
Date: Mon, 14 Mar 2022 01:24:59 +0200
Subject: [PATCH] Revert "net: dsa: mv88e6xxx: flush switchdev FDB workqueue
 before removing VLAN"

This reverts commit 2566a89b9e163b2fcd104d6005e0149f197b8a48.

The above change depends on upstream commit 0faf890fc519 ("net: dsa:
drop rtnl_lock from dsa_slave_switchdev_event_work"), which is not
present in linux-5.15.y. Without that change, waiting for the switchdev
workqueue causes deadlocks on the rtnl_mutex.

Backporting the dependency commit isn't trivial/desirable, since it
requires that the following dependencies of the dependency are also
backported:

df405910ab9f net: dsa: sja1105: wait for dynamic config command completion on writes too
eb016afd83a9 net: dsa: sja1105: serialize access to the dynamic config interface
2468346c5677 net: mscc: ocelot: serialize access to the MAC table
f7eb4a1c0864 net: dsa: b53: serialize access to the ARL table
cf231b436f7c net: dsa: lantiq_gswip: serialize access to the PCE registers
338a3a4745aa net: dsa: introduce locking for the address lists on CPU and DSA ports

and then this bugfix on top:

8940e6b669ca ("net: dsa: avoid call to __dev_set_promiscuity() while rtnl_mutex isn't held")

Reported-by: Daniel Suchy <danny@xxxxxxxxxx>
Signed-off-by: Vladimir Oltean <vladimir.oltean@xxxxxxx>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 7 -------
 include/net/dsa.h                | 1 -
 net/dsa/dsa.c                    | 1 -
 net/dsa/dsa_priv.h               | 1 +
 4 files changed, 1 insertion(+), 9 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 263da7e2d6be..056e3b65cd27 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2291,13 +2291,6 @@ static int mv88e6xxx_port_vlan_del(struct dsa_switch *ds, int port,
 	if (!mv88e6xxx_max_vid(chip))
 		return -EOPNOTSUPP;
 
-	/* The ATU removal procedure needs the FID to be mapped in the VTU,
-	 * but FDB deletion runs concurrently with VLAN deletion. Flush the DSA
-	 * switchdev workqueue to ensure that all FDB entries are deleted
-	 * before we remove the VLAN.
-	 */
-	dsa_flush_workqueue();
-
 	mv88e6xxx_reg_lock(chip);
 
 	err = mv88e6xxx_port_get_pvid(chip, port, &pvid);
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 49e5ece9361c..d784e76113b8 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -1056,7 +1056,6 @@ void dsa_unregister_switch(struct dsa_switch *ds);
 int dsa_register_switch(struct dsa_switch *ds);
 void dsa_switch_shutdown(struct dsa_switch *ds);
 struct dsa_switch *dsa_switch_find(int tree_index, int sw_index);
-void dsa_flush_workqueue(void);
 #ifdef CONFIG_PM_SLEEP
 int dsa_switch_suspend(struct dsa_switch *ds);
 int dsa_switch_resume(struct dsa_switch *ds);
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 4ff03fb262e0..41f36ad8b0ec 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -349,7 +349,6 @@ void dsa_flush_workqueue(void)
 {
 	flush_workqueue(dsa_owq);
 }
-EXPORT_SYMBOL_GPL(dsa_flush_workqueue);
 
 int dsa_devlink_param_get(struct devlink *dl, u32 id,
 			  struct devlink_param_gset_ctx *ctx)
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 33ab7d7af9eb..a5c9bc7b66c6 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -170,6 +170,7 @@ void dsa_tag_driver_put(const struct dsa_device_ops *ops);
 const struct dsa_device_ops *dsa_find_tagger_by_name(const char *buf);
 
 bool dsa_schedule_work(struct work_struct *work);
+void dsa_flush_workqueue(void);
 const char *dsa_tag_protocol_to_str(const struct dsa_device_ops *ops);
 
 static inline int dsa_tag_protocol_overhead(const struct dsa_device_ops *ops)
-- 
2.25.1


[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux