Hi Jozsef, On Mon, Feb 29, 2016 at 01:47:59PM +0100, Jozsef Kadlecsik wrote: > On Mon, 29 Feb 2016, Pablo Neira Ayuso wrote: > > > On Wed, Feb 24, 2016 at 09:19:26PM +0100, Jozsef Kadlecsik wrote: > > > Flushing/listing entries was not RCU safe, so parallel flush/dump > > > could lead to kernel crash. Bug reported by Deniz Eren. > > > > > > Fixes netfilter bugzilla id #1050. > > > > > > Signed-off-by: Jozsef Kadlecsik <kadlec@xxxxxxxxxxxxxxxxx> > > > --- > > > net/netfilter/ipset/ip_set_core.c | 3 ++ > > > net/netfilter/ipset/ip_set_list_set.c | 55 ++++++++++++++++------------------- > > > 2 files changed, 28 insertions(+), 30 deletions(-) > > > > > > diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c > > > index 95db43f..7e6568c 100644 > > > --- a/net/netfilter/ipset/ip_set_core.c > > > +++ b/net/netfilter/ipset/ip_set_core.c > > > @@ -985,6 +985,9 @@ static int ip_set_destroy(struct net *net, struct sock *ctnl, > > > if (unlikely(protocol_failed(attr))) > > > return -IPSET_ERR_PROTOCOL; > > > > > > + /* Must wait for flush to be really finished in list:set */ > > > + rcu_barrier(); > > > > Jozsef, are you sure you need this rcu_barrier()? > > > > This is waiting for rcu callback completion (ie. decrement the set > > reference counter and releasing the extension and object itself). > > Yes, that's exactly what it's waiting for, in the case of sets which are > elements in set:list type of sets. > > > The rcu read side should be safe when accessing old copies from the > > dumping path when using call_rcu(). > > Three competing actions play role here: flush/delete, destroy, dumping. > The rcu_barrier() above waits for the flush/delete actions. The rcu read > side has access to the set id only and must lookup the set by id in order > to get the name, which is protected by a reference counter. The function > ip_set_name_byindex() intentionally checks that the reference counter > cannot be equal to zero and I believe it's an important internal sanity > checking and I don't want to remove it. The problem was that the reference > counter of the set was first decremented at flush/delete and so we could > get a BUG_ON() crash in ip_set_name_byindex() when an ongoing dumping was > processed. So the reference counter must be released last for a successful > parallel dumping. If rcu_barrier() was not added to the destroy path, a > "destroy" immediately following a "flush" can lead to the error "set is in > use, cannot delete". I tested it and couldn't find a simpler way. ip_set_destroy() calls ip_set_destroy_set() which releases the set object and its content without waiting for the rcu grace period. I think readers (dump path) may still be walking on the set by when you're releasing this object. Our nfnetlink dump path is lockless (rcu read side protected), so we unpublish the set object and release objects via rcu callback when operating from the control plane. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html