Re: Race between IPSET_CMD_CREATE and IPSET_CMD_SWAP

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Kyle,

On Mon, 4 Sep 2023, Kyle Zeng wrote:

> There is a race between IPSET_CMD_ADD and IPSET_CMD_SWAP in 
> netfilter/ip_set, which can lead to the invocation of `__ip_set_put` on 
> a wrong `set`, triggering the `BUG_ON(set->ref == 0);` check in it.
> 
> More specifically, in `ip_set_swap`, it will hold the `ip_set_ref_lock`
> and then do the following to swap the sets:
> ~~~
> 	strncpy(from_name, from->name, IPSET_MAXNAMELEN);
> 	strncpy(from->name, to->name, IPSET_MAXNAMELEN);
> 	strncpy(to->name, from_name, IPSET_MAXNAMELEN);
> 
> 	swap(from->ref, to->ref);
> ~~~
> But in the retry loop in `call_ad`:
> ~~~
> 		if (retried) {
> 			__ip_set_get(set);
> 			nfnl_unlock(NFNL_SUBSYS_IPSET);
> 			cond_resched();
> 			nfnl_lock(NFNL_SUBSYS_IPSET);
> 			__ip_set_put(set);
> 		}
> ~~~
> No lock is hold, when it does the `cond_resched()`.
> As a result, `ip_set_ref_lock` (in thread 2) can swap the set with another when thread
> 1 is doing the `cond_resched()`. When it wakes up, the `set` variable
> alreays means another `set`, calling `__ip_set_put` on it will decrease
> the refcount on the wrong `set`, triggering the `BUG_ON` call.
> 
> I'm not sure what is the proper way to fix this issue so I'm asking for
> help.
> 
> A proof-of-concept code that can trigger the bug is attached.
> 
> The bug is confirmed on v5.10, v6.1, v6.5.rc7 and upstream.

Thanks for the thorough report. I think the proper fix is to change the 
reference counter at rescheduling from "ref" to "ref_netlink", which 
protects long taking procedures (originally just dumping). Could you 
verify that the patch below fixes the issue?

diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
index e564b5174261..8a9cea8ed5ed 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -682,6 +682,15 @@ __ip_set_put(struct ip_set *set)
 /* set->ref can be swapped out by ip_set_swap, netlink events (like dump) need
  * a separate reference counter
  */
+static void
+__ip_set_get_netlink(struct ip_set *set)
+{
+	write_lock_bh(&ip_set_ref_lock);
+	set->ref_netlink++;
+	write_unlock_bh(&ip_set_ref_lock);
+}
+
+
 static void
 __ip_set_put_netlink(struct ip_set *set)
 {
@@ -1693,11 +1702,11 @@ call_ad(struct net *net, struct sock *ctnl, struct sk_buff *skb,
 
 	do {
 		if (retried) {
-			__ip_set_get(set);
+			__ip_set_get_netlink(set);
 			nfnl_unlock(NFNL_SUBSYS_IPSET);
 			cond_resched();
 			nfnl_lock(NFNL_SUBSYS_IPSET);
-			__ip_set_put(set);
+			__ip_set_put_netlink(set);
 		}
 
 		ip_set_lock(set);

Best regards,
Jozsef
-
E-mail  : kadlec@xxxxxxxxxxxxxxxxx, kadlecsik.jozsef@xxxxxxxxx
PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics
          H-1525 Budapest 114, POB. 49, Hungary



[Index of Archives]     [Netfitler Users]     [Berkeley Packet Filter]     [LARTC]     [Bugtraq]     [Yosemite Forum]

  Powered by Linux