On Mon, Mar 14, 2022 at 08:21:30AM +0100, gregkh@xxxxxxxxxxxxxxxxxxx wrote: > On Mon, Mar 14, 2022 at 07:38:50AM +0100, gregkh@xxxxxxxxxxxxxxxxxxx wrote: > > On Sun, Mar 13, 2022 at 02:10:31PM +0000, Vladimir Oltean wrote: > > > Hi Daniel, > > > > > > On Sun, Mar 13, 2022 at 03:03:07PM +0100, Daniel Suchy wrote: > > > > Hello, > > > > > > > > I noticed boot problems on my Turris Omnia (with Marvell 88E6176 switch > > > > chip) after "net: dsa: mv88e6xxx: flush switchdev FDB workqueue before > > > > removing VLAN" commit https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2566a89b9e163b2fcd104d6005e0149f197b8a48 > > > > > > > > Within logs I catched hung kernel tasks (see below), at least first is > > > > related to DSA subsystem. > > > > > > > > When I revert this patch, everything works as expected and without any > > > > issues. > > > > > > > > In my setup, I have few vlans on affected switch (i'm using ifupdown2 v3.0 > > > > with iproute2 5.16 for configuration). > > > > > > > > It seems your this patch introduces some new problem (at least for 5.15 > > > > kernels). I suggest revert this patch. > > > > > > > > - Daniel > > > > > > Oh wow, I'm terribly sorry. Yes, this patch shouldn't have been > > > backported to kernel 5.15 and below, but I guess I missed the > > > backport notification email and forgot to tell Greg about this. > > > Patch "net: dsa: mv88e6xxx: flush switchdev FDB workqueue before > > > removing VLAN" needs to be immediately reverted from these trees. > > > > > > Greg, to avoid this from happening in the future, would something like > > > this work? Is this parsed in some way? > > > > > > Depends-on: 0faf890fc519 ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work") # which first appeared in v5.16 > > > > The "Fixes:" tag will solve this, please just use that in the future. > > Ah, you did have a fixes tag here, so then use the way to say "you also > need to add another patch here" by adding the sha to the line for the > stable tree: > cc: stable@xxxxxxxxxxxxxxx # 0faf890fc519 > > So, should I just backport that commit instead? The "Fixes:" line says > this needs to be backported to 4.14, which is why I added it to these > trees. > > thanks, No, don't backport the dependency, just revert the patch (hence my question: how can I describe "don't backport beyond commit X"?). Here, you can apply the revert attached.
From 571ca982c0a53d127b4c92339262c1e896e9c5a1 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean <vladimir.oltean@xxxxxxx> Date: Mon, 14 Mar 2022 01:24:59 +0200 Subject: [PATCH] Revert "net: dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN" This reverts commit 2566a89b9e163b2fcd104d6005e0149f197b8a48. The above change depends on upstream commit 0faf890fc519 ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work"), which is not present in linux-5.15.y. Without that change, waiting for the switchdev workqueue causes deadlocks on the rtnl_mutex. Backporting the dependency commit isn't trivial/desirable, since it requires that the following dependencies of the dependency are also backported: df405910ab9f net: dsa: sja1105: wait for dynamic config command completion on writes too eb016afd83a9 net: dsa: sja1105: serialize access to the dynamic config interface 2468346c5677 net: mscc: ocelot: serialize access to the MAC table f7eb4a1c0864 net: dsa: b53: serialize access to the ARL table cf231b436f7c net: dsa: lantiq_gswip: serialize access to the PCE registers 338a3a4745aa net: dsa: introduce locking for the address lists on CPU and DSA ports and then this bugfix on top: 8940e6b669ca ("net: dsa: avoid call to __dev_set_promiscuity() while rtnl_mutex isn't held") Reported-by: Daniel Suchy <danny@xxxxxxxxxx> Signed-off-by: Vladimir Oltean <vladimir.oltean@xxxxxxx> --- drivers/net/dsa/mv88e6xxx/chip.c | 7 ------- include/net/dsa.h | 1 - net/dsa/dsa.c | 1 - net/dsa/dsa_priv.h | 1 + 4 files changed, 1 insertion(+), 9 deletions(-) diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index 263da7e2d6be..056e3b65cd27 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -2291,13 +2291,6 @@ static int mv88e6xxx_port_vlan_del(struct dsa_switch *ds, int port, if (!mv88e6xxx_max_vid(chip)) return -EOPNOTSUPP; - /* The ATU removal procedure needs the FID to be mapped in the VTU, - * but FDB deletion runs concurrently with VLAN deletion. Flush the DSA - * switchdev workqueue to ensure that all FDB entries are deleted - * before we remove the VLAN. - */ - dsa_flush_workqueue(); - mv88e6xxx_reg_lock(chip); err = mv88e6xxx_port_get_pvid(chip, port, &pvid); diff --git a/include/net/dsa.h b/include/net/dsa.h index 49e5ece9361c..d784e76113b8 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -1056,7 +1056,6 @@ void dsa_unregister_switch(struct dsa_switch *ds); int dsa_register_switch(struct dsa_switch *ds); void dsa_switch_shutdown(struct dsa_switch *ds); struct dsa_switch *dsa_switch_find(int tree_index, int sw_index); -void dsa_flush_workqueue(void); #ifdef CONFIG_PM_SLEEP int dsa_switch_suspend(struct dsa_switch *ds); int dsa_switch_resume(struct dsa_switch *ds); diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c index 4ff03fb262e0..41f36ad8b0ec 100644 --- a/net/dsa/dsa.c +++ b/net/dsa/dsa.c @@ -349,7 +349,6 @@ void dsa_flush_workqueue(void) { flush_workqueue(dsa_owq); } -EXPORT_SYMBOL_GPL(dsa_flush_workqueue); int dsa_devlink_param_get(struct devlink *dl, u32 id, struct devlink_param_gset_ctx *ctx) diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h index 33ab7d7af9eb..a5c9bc7b66c6 100644 --- a/net/dsa/dsa_priv.h +++ b/net/dsa/dsa_priv.h @@ -170,6 +170,7 @@ void dsa_tag_driver_put(const struct dsa_device_ops *ops); const struct dsa_device_ops *dsa_find_tagger_by_name(const char *buf); bool dsa_schedule_work(struct work_struct *work); +void dsa_flush_workqueue(void); const char *dsa_tag_protocol_to_str(const struct dsa_device_ops *ops); static inline int dsa_tag_protocol_overhead(const struct dsa_device_ops *ops) -- 2.25.1