This is a note to let you know that I've just added the patch titled netlink: don't call ->netlink_bind with table lock held to the 5.10-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: netlink-don-t-call-netlink_bind-with-table-lock-held.patch and it can be found in the queue-5.10 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable@xxxxxxxxxxxxxxx> know about it. >From stable+bounces-5294-greg=kroah.com@xxxxxxxxxxxxxxx Mon Dec 11 13:39:02 2023 From: Ido Schimmel <idosch@xxxxxxxxxx> Date: Mon, 11 Dec 2023 14:37:37 +0200 Subject: netlink: don't call ->netlink_bind with table lock held To: <stable@xxxxxxxxxxxxxxx> Cc: <davem@xxxxxxxxxxxxx>, <kuba@xxxxxxxxxx>, <pabeni@xxxxxxxxxx>, <edumazet@xxxxxxxxxx>, <nhorman@xxxxxxxxxxxxx>, <yotam.gi@xxxxxxxxx>, <sashal@xxxxxxxxxx>, <fw@xxxxxxxxx>, <jacob.e.keller@xxxxxxxxx>, <jiri@xxxxxxxxxx> Message-ID: <20231211123740.822802-2-idosch@xxxxxxxxxx> From: Ido Schimmel <idosch@xxxxxxxxxx> From: Florian Westphal <fw@xxxxxxxxx> commit f2764bd4f6a8dffaec3e220728385d9756b3c2cb upstream. When I added support to allow generic netlink multicast groups to be restricted to subscribers with CAP_NET_ADMIN I was unaware that a genl_bind implementation already existed in the past. It was reverted due to ABBA deadlock: 1. ->netlink_bind gets called with the table lock held. 2. genetlink bind callback is invoked, it grabs the genl lock. But when a new genl subsystem is (un)registered, these two locks are taken in reverse order. One solution would be to revert again and add a comment in genl referring 1e82a62fec613, "genetlink: remove genl_bind"). This would need a second change in mptcp to not expose the raw token value anymore, e.g. by hashing the token with a secret key so userspace can still associate subflow events with the correct mptcp connection. However, Paolo Abeni reminded me to double-check why the netlink table is locked in the first place. I can't find one. netlink_bind() is already called without this lock when userspace joins a group via NETLINK_ADD_MEMBERSHIP setsockopt. Same holds for the netlink_unbind operation. Digging through the history, commit f773608026ee1 ("netlink: access nlk groups safely in netlink bind and getname") expanded the lock scope. commit 3a20773beeeeade ("net: netlink: cap max groups which will be considered in netlink_bind()") .. removed the nlk->ngroups access that the lock scope extension was all about. Reduce the lock scope again and always call ->netlink_bind without the table lock. The Fixes tag should be vs. the patch mentioned in the link below, but that one got squash-merged into the patch that came earlier in the series. Fixes: 4d54cc32112d8d ("mptcp: avoid lock_fast usage in accept path") Link: https://lore.kernel.org/mptcp/20210213000001.379332-8-mathew.j.martineau@xxxxxxxxxxxxxxx/T/#u Cc: Cong Wang <xiyou.wangcong@xxxxxxxxx> Cc: Xin Long <lucien.xin@xxxxxxxxx> Cc: Johannes Berg <johannes.berg@xxxxxxxxx> Cc: Sean Tranchetti <stranche@xxxxxxxxxxxxxx> Cc: Paolo Abeni <pabeni@xxxxxxxxxx> Cc: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> Signed-off-by: Florian Westphal <fw@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> Signed-off-by: Ido Schimmel <idosch@xxxxxxxxxx> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> --- net/netlink/af_netlink.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -1021,7 +1021,6 @@ static int netlink_bind(struct socket *s return -EINVAL; } - netlink_lock_table(); if (nlk->netlink_bind && groups) { int group; @@ -1033,13 +1032,14 @@ static int netlink_bind(struct socket *s if (!err) continue; netlink_undo_bind(group, groups, sk); - goto unlock; + return err; } } /* No need for barriers here as we return to user-space without * using any of the bound attributes. */ + netlink_lock_table(); if (!bound) { err = nladdr->nl_pid ? netlink_insert(sk, nladdr->nl_pid) : Patches currently in stable-queue which might be from kroah.com@xxxxxxxxxxxxxxx are queue-5.10/psample-require-cap_net_admin-when-joining-packets-group.patch queue-5.10/drop_monitor-require-cap_sys_admin-when-joining-events-group.patch queue-5.10/netlink-don-t-call-netlink_bind-with-table-lock-held.patch queue-5.10/genetlink-add-cap_net_admin-test-for-multicast-bind.patch