On Tue, 14 Apr 2009 17:49:57 +0200 Eric Dumazet <dada1@xxxxxxxxxxxxx> wrote: > Stephen Hemminger a écrit : > > On Tue, 14 Apr 2009 16:23:33 +0200 > > Eric Dumazet <dada1@xxxxxxxxxxxxx> wrote: > > > >> Patrick McHardy a écrit : > >>> Stephen Hemminger wrote: > >>>> This is an alternative version of ip/ip6/arp tables locking using > >>>> per-cpu locks. This avoids the overhead of synchronize_net() during > >>>> update but still removes the expensive rwlock in earlier versions. > >>>> > >>>> The idea for this came from an earlier version done by Eric Duzamet. > >>>> Locking is done per-cpu, the fast path locks on the current cpu > >>>> and updates counters. The slow case involves acquiring the locks on > >>>> all cpu's. > >>>> > >>>> The mutex that was added for 2.6.30 in xt_table is unnecessary since > >>>> there already is a mutex for xt[af].mutex that is held. > >>>> > >>>> Tested basic functionality (add/remove/list), but don't have test cases > >>>> for stress, ip6tables or arptables. > >>> Thanks Stephen, I'll do some testing with ip6tables. > >> Here is the patch I cooked on top of Stephen one to get proper locking. > > > > I see no demonstrated problem with locking in my version. > > Yes, I did not crash any machine around there, should we wait for a bug report ? :) > > > The reader/writer race is already handled. On replace the race of > > > > CPU 0 CPU 1 > > lock (iptables(1)) > > refer to oldinfo > > swap in new info > > foreach CPU > > lock iptables(i) > > (spin) unlock(iptables(1)) > > read oldinfo > > unlock > > ... > > > > The point is my locking works, you just seem to feel more comfortable with > > a global "stop all CPU's" solution. > > Oh right, I missed that xt_replace_table() was *followed* by a get_counters() > call, but I am pretty sure something is needed in xt_replace_table(). > > A memory barrier at least (smp_wmb()) > > As soon as we do "table->private = newinfo;", other cpus might fetch incorrect > values for newinfo->fields. > > In the past, we had a write_lock_bh()/write_unlock_bh() pair that was > doing this for us. > Then we had rcu_assign_pointer() that also had this memory barrier implied. > > Even if vmalloc() calls we do before calling xt_replace_table() probably > already force barriers, add one for reference, just in case we change callers > logic to call kmalloc() instead of vmalloc() or whatever... > You are right, doing something with barrier would be safer there. How about using xchg? @@ -682,26 +668,19 @@ xt_replace_table(struct xt_table *table, struct xt_table_info *newinfo, int *error) { - struct xt_table_info *oldinfo, *private; + struct xt_table_info *private; /* Do the substitution. */ - mutex_lock(&table->lock); private = table->private; /* Check inside lock: is the old number correct? */ if (num_counters != private->number) { duprintf("num_counters != table->private->number (%u/%u)\n", num_counters, private->number); - mutex_unlock(&table->lock); *error = -EAGAIN; return NULL; } - oldinfo = private; - rcu_assign_pointer(table->private, newinfo); - newinfo->initial_entries = oldinfo->initial_entries; - mutex_unlock(&table->lock); - - synchronize_net(); - return oldinfo; + newinfo->initial_entries = private->initial_entries; + return xchg(&table->private, newinfo); } EXPORT_SYMBOL_GPL(xt_replace_table); P.s: we all missed the ordering bug in the RCU version?? -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html