On Wed, Oct 11, 2017 at 11:18 AM, Florian Westphal <fw@xxxxxxxxx> wrote: > Eric Dumazet <edumazet@xxxxxxxxxx> wrote: >> On Wed, Oct 11, 2017 at 10:48 AM, Florian Westphal <fw@xxxxxxxxx> wrote: >> > Eric Dumazet <edumazet@xxxxxxxxxx> wrote: >> >> On Wed, Oct 11, 2017 at 7:26 AM, Florian Westphal <fw@xxxxxxxxx> wrote: >> >> > xt_replace_table relies on table replacement counter retrieval (which >> >> > uses xt_recseq to synchronize pcpu counters). >> >> > >> >> > This is fine, however with large rule set get_counters() can take >> >> > a very long time -- it needs to synchronize all counters because >> >> > it has to assume concurrent modifications can occur. >> >> > >> >> > Make xt_replace_table synchronize by itself by waiting until all cpus >> >> > had an even seqcount. >> >> > >> >> > This allows a followup patch to copy the counters of the old ruleset >> >> > without any synchonization after xt_replace_table has completed. >> >> > >> >> > Cc: Dan Williams <dcbw@xxxxxxxxxx> >> >> > Cc: Eric Dumazet <edumazet@xxxxxxxxxx> >> >> > Signed-off-by: Florian Westphal <fw@xxxxxxxxx> >> >> > --- >> >> > v3: check for 'seq is uneven' OR 'has changed' since >> >> > last check. Its fine if seq is uneven iff its a different >> >> > sequence number than the initial one. >> >> > >> >> > v2: fix Erics email address >> >> > net/netfilter/x_tables.c | 18 +++++++++++++++--- >> >> > 1 file changed, 15 insertions(+), 3 deletions(-) >> >> > >> >> > >> >> >> >> Reviewed-by: Eric Dumazet <edumazet@xxxxxxxxxx> >> >> >> >> But it seems we need an extra smp_wmb() after >> >> >> >> smp_wmb(); >> >> table->private = newinfo; >> >> + smp_wmb(); >> >> >> >> Otherwise we have no guarantee other cpus actually see the new ->private value. >> > >> > Seems to be unrelated to this change, so I will submit >> > a separate patch for nf.git that adds this. >> >> This is related to this change, please read the comment before the >> local_bh_enable9) >> >> /* >> * Even though table entries have now been swapped, other CPU's >> * may still be using the old entries. This is okay, because >> * resynchronization happens because of the locking done >> * during the get_counters() routine. >> */ > > Hmm, but get_counters() does not issue a wmb, and the 'new' code added > here essentially is the same as get_counters(), except that we only > read seqcount until we saw a change (and not for each counter in > the rule set). > > What am I missing? Your sync code does nothing interesting if we are not sure new table->private value is visible Without barriers, compiler/cpu could do this : + /* ... so wait for even xt_recseq on all cpus */ + for_each_possible_cpu(cpu) { + seqcount_t *s = &per_cpu(xt_recseq, cpu); + u32 seq = raw_read_seqcount(s); + + if (seq & 1) { + do { + cond_resched(); + cpu_relax(); + } while (seq == raw_read_seqcount(s)); + } + } /* finally, write new private value */ table->private = newinfo; Basically, your loop is now useless and you could remove it. So there is definitely a bug. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html