Hi Kyle, On Mon, 4 Sep 2023, Kyle Zeng wrote: > There is a race between IPSET_CMD_ADD and IPSET_CMD_SWAP in > netfilter/ip_set, which can lead to the invocation of `__ip_set_put` on > a wrong `set`, triggering the `BUG_ON(set->ref == 0);` check in it. > > More specifically, in `ip_set_swap`, it will hold the `ip_set_ref_lock` > and then do the following to swap the sets: > ~~~ > strncpy(from_name, from->name, IPSET_MAXNAMELEN); > strncpy(from->name, to->name, IPSET_MAXNAMELEN); > strncpy(to->name, from_name, IPSET_MAXNAMELEN); > > swap(from->ref, to->ref); > ~~~ > But in the retry loop in `call_ad`: > ~~~ > if (retried) { > __ip_set_get(set); > nfnl_unlock(NFNL_SUBSYS_IPSET); > cond_resched(); > nfnl_lock(NFNL_SUBSYS_IPSET); > __ip_set_put(set); > } > ~~~ > No lock is hold, when it does the `cond_resched()`. > As a result, `ip_set_ref_lock` (in thread 2) can swap the set with another when thread > 1 is doing the `cond_resched()`. When it wakes up, the `set` variable > alreays means another `set`, calling `__ip_set_put` on it will decrease > the refcount on the wrong `set`, triggering the `BUG_ON` call. > > I'm not sure what is the proper way to fix this issue so I'm asking for > help. > > A proof-of-concept code that can trigger the bug is attached. > > The bug is confirmed on v5.10, v6.1, v6.5.rc7 and upstream. Thanks for the thorough report. I think the proper fix is to change the reference counter at rescheduling from "ref" to "ref_netlink", which protects long taking procedures (originally just dumping). Could you verify that the patch below fixes the issue? diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c index e564b5174261..8a9cea8ed5ed 100644 --- a/net/netfilter/ipset/ip_set_core.c +++ b/net/netfilter/ipset/ip_set_core.c @@ -682,6 +682,15 @@ __ip_set_put(struct ip_set *set) /* set->ref can be swapped out by ip_set_swap, netlink events (like dump) need * a separate reference counter */ +static void +__ip_set_get_netlink(struct ip_set *set) +{ + write_lock_bh(&ip_set_ref_lock); + set->ref_netlink++; + write_unlock_bh(&ip_set_ref_lock); +} + + static void __ip_set_put_netlink(struct ip_set *set) { @@ -1693,11 +1702,11 @@ call_ad(struct net *net, struct sock *ctnl, struct sk_buff *skb, do { if (retried) { - __ip_set_get(set); + __ip_set_get_netlink(set); nfnl_unlock(NFNL_SUBSYS_IPSET); cond_resched(); nfnl_lock(NFNL_SUBSYS_IPSET); - __ip_set_put(set); + __ip_set_put_netlink(set); } ip_set_lock(set); Best regards, Jozsef - E-mail : kadlec@xxxxxxxxxxxxxxxxx, kadlecsik.jozsef@xxxxxxxxx PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt Address : Wigner Research Centre for Physics H-1525 Budapest 114, POB. 49, Hungary