Mathieu Desnoyers a écrit : > * Eric Dumazet (dada1@xxxxxxxxxxxxx) wrote: >> From: Stephen Hemminger <shemminger@xxxxxxxxxx> >> >>> Epilogue due to master Jarek. Lockdep carest not about the locking >>> doth bestowed. Therefore no keys are needed. >>> >>> Signed-off-by: Stephen Hemminger <shemminger@xxxxxxxxxx> >> So far, so good, should be ready for inclusion now, nobody complained :) >> >> I include the final patch, merge of your last two patches. >> >> David, could you please review it once again and apply it if it's OK ? >> > [...] >> +/* >> + * Per-CPU read/write lock associated with per-cpu table entries. >> + * This is not a general solution but makes reader locking fast since >> + * there is no shared variable to cause cache ping-pong; but adds an >> + * additional write-side penalty since update must lock all >> + * possible CPU's. >> + * >> + * Read lock is used by ip/arp/ip6 tables rule processing which runs per-cpu. >> + * It needs to ensure that the rules are not being changed while packet >> + * is being processed. In some cases, the read lock will be acquired >> + * twice on the same CPU; this is okay because read locks handle nesting. >> + * >> + * Write lock is used in two cases: >> + * 1. reading counter values >> + * all readers need to be stopped and the per-CPU values are summed. >> + * >> + * 2. replacing tables >> + * any readers that are using the old tables have to complete >> + * before freeing the old table. This is handled by reading >> + * as a side effect of reading counters >> + */ >> +DECLARE_PER_CPU(rwlock_t, xt_info_locks); >> + >> +static inline void xt_info_rdlock_bh(void) >> +{ >> + /* >> + * Note: can not use read_lock_bh(&__get_cpu_var(xt_info_locks)) >> + * because need to ensure that preemption is disable before >> + * acquiring per-cpu-variable, so do it as a two step process >> + */ >> + local_bh_disable(); > > Why do you need to disable bottom halves on the read-side ? You could > probably just disable preemption, given this lock is nestable on the > read-side anyway. Or I'm missing something obvious ? It may not be obvious, but subject already raised on this list, so I'll try to be as precise as possible (But may be wrong on some points, I'll let Patrick correct me if necessary) ipt_do_table() is not a readonly function returning a verdict. 1) It handles a stack (check how is used next->comefrom) that seems to be stored on rules themselves. (This is how I understand this code) This is safe as each cpu has its own copy of rules/counters, and BH protected. 2) It also updates two 64 bit counters (bytes/packets) on each matched rule. 3) Some netfilter matches/targets probably rely on the fact their handlers are run with BH disabled by their caller (ipt_do_table()/arp/ip6...) These must be BH protected (and preempt disabled too), or else : 1) A softirq could interrupt a process in the middle of ipt_do_table() and corrupt its "stack". 2) A softirq could interrupt a process in ipt_do_table() in the middle of the ADD_COUNTER(). Some counters could be corrupted. 3) Some netfiler extensions would break. Previous linux versions already used a read_lock_bh() here, on a single and shared rwlock, there is nothing new on this BH locking AFAIK. Thank you -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html