Re: [PATCH 1/1] netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test, v2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Florian,

On Sat, 4 Nov 2023, Florian Westphal wrote:

> Jozsef Kadlecsik <kadlec@xxxxxxxxxxxxx> wrote:
> > Linkui Xiao reported that there's a race condition when ipset swap and destroy is
> > called, which can lead to crash in add/del/test element operations. Swap then
> > destroy are usual operations to replace a set with another one in a production
> > system. The issue can in some cases be reproduced with the script:
> 
> > iptables -A INPUT -m set --match-set hash_ip1 src -j ACCEPT
> > while [ 1 ]
> > do
> > 	# ... Ongoing traffic...
> >         ipset create hash_ip2 hash:net family inet hashsize 1024 maxelem 1048576
> >         ipset add hash_ip2 172.20.0.0/16
> >         ipset swap hash_ip1 hash_ip2
> >         ipset destroy hash_ip2
> >         sleep 0.05
> > done
> > 
> > In the race case the possible order of the operations are
> > 
> > 	CPU0			CPU1
> > 	ip_set_test
> > 				ipset swap hash_ip1 hash_ip2
> > 				ipset destroy hash_ip2
> > 	hash_net_kadt
> > 
> > Swap replaces hash_ip1 with hash_ip2 and then destroy removes hash_ip2 which
> > is the original hash_ip1. ip_set_test was called on hash_ip1 and because destroy
> > removed it, hash_net_kadt crashes.
> > 
> > The fix is to protect both the list of the sets and the set pointers in an extended RCU
> > region and before exiting ip_set_swap(), wait to finish all started rcu_read_lock().
> > The first version of the patch was written by Linkui Xiao <xiaolinkui@xxxxxxxxxx>.
> > 
> > v2: synchronize_rcu() is moved into ip_set_swap() in order not to burden
> > ip_set_destroy() unnecessarily when all sets are destroyed.
> 
> Hmm.  Isn't it enough to only call synchronize_rcu() in ip_set_swap?
> 
> All netfilter hooks run with rcu_read_lock() held, em_ipset.c wraps the 
> entire ip_set_test() in rcu read lock/unlock pair.

Hm, actually, you are right. Technically there's no need to extend the 
rcu_read_lock() protected area inside ipset itself.
 
> > @@ -704,13 +704,18 @@ ip_set_rcu_get(struct net *net, ip_set_id_t index)
> >  	struct ip_set_net *inst = ip_set_pernet(net);
> >  
> >  	rcu_read_lock();
> > -	/* ip_set_list itself needs to be protected */
> > +	/* ip_set_list and the set pointer need to be protected */
> >  	set = rcu_dereference(inst->ip_set_list)[index];
> > -	rcu_read_unlock();
> >  
> >  	return set;
> >  }
> 
> ... so I don't understand why ip_set_rcu_get() has to extend
> the locked section.
> 
> AFAICS there are only two type of callers:
> 1. rcu read lock is already held (datapath)
> 2. ipset nfnl subsys mutex is held
> 
> *probably* This could be changed in a separate patch to:
> 
>   -	rcu_read_lock();
>   -	/* ip_set_list itself needs to be protected */
>   -	set = rcu_dereference(inst->ip_set_list)[index];
>   -	rcu_read_unlock();
>   +	/* ip_set_list and the set pointer need to be protected */
>   +	return rcu_dereference_check(inst->ip_set_list, lockdep_nfnl_is_held(NFNL_SUBSYS_IPSET)[index];
> 
> This is an example, probably better to add a small
> "ip_set_dereference_nfnl" helper to hide the lockdep construct...
> 
> Not saying the patch is wrong; rcu read locks nest and
> ipset locking is not simple so I might be missing something.

I'm going to rework the patch, thanks for your input!

Best regards,
Jozsef
-- 
E-mail  : kadlec@xxxxxxxxxxxxxxxxx, kadlecsik.jozsef@xxxxxxxxx
PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics
          H-1525 Budapest 114, POB. 49, Hungary



[Index of Archives]     [Netfitler Users]     [Berkeley Packet Filter]     [LARTC]     [Bugtraq]     [Yosemite Forum]

  Powered by Linux