* Eric Dumazet (dada1@xxxxxxxxxxxxx) wrote: > Mathieu Desnoyers a écrit : > > * Eric Dumazet (dada1@xxxxxxxxxxxxx) wrote: > >> From: Stephen Hemminger <shemminger@xxxxxxxxxx> > >> > >>> Epilogue due to master Jarek. Lockdep carest not about the locking > >>> doth bestowed. Therefore no keys are needed. > >>> > >>> Signed-off-by: Stephen Hemminger <shemminger@xxxxxxxxxx> > >> So far, so good, should be ready for inclusion now, nobody complained :) > >> > >> I include the final patch, merge of your last two patches. > >> > >> David, could you please review it once again and apply it if it's OK ? > >> > > [...] > >> +/* > >> + * Per-CPU read/write lock associated with per-cpu table entries. > >> + * This is not a general solution but makes reader locking fast since > >> + * there is no shared variable to cause cache ping-pong; but adds an > >> + * additional write-side penalty since update must lock all > >> + * possible CPU's. > >> + * > >> + * Read lock is used by ip/arp/ip6 tables rule processing which runs per-cpu. > >> + * It needs to ensure that the rules are not being changed while packet > >> + * is being processed. In some cases, the read lock will be acquired > >> + * twice on the same CPU; this is okay because read locks handle nesting. > >> + * > >> + * Write lock is used in two cases: > >> + * 1. reading counter values > >> + * all readers need to be stopped and the per-CPU values are summed. > >> + * > >> + * 2. replacing tables > >> + * any readers that are using the old tables have to complete > >> + * before freeing the old table. This is handled by reading > >> + * as a side effect of reading counters > >> + */ > >> +DECLARE_PER_CPU(rwlock_t, xt_info_locks); > >> + > >> +static inline void xt_info_rdlock_bh(void) > >> +{ > >> + /* > >> + * Note: can not use read_lock_bh(&__get_cpu_var(xt_info_locks)) > >> + * because need to ensure that preemption is disable before > >> + * acquiring per-cpu-variable, so do it as a two step process > >> + */ > >> + local_bh_disable(); > > > > Why do you need to disable bottom halves on the read-side ? You could > > probably just disable preemption, given this lock is nestable on the > > read-side anyway. Or I'm missing something obvious ? > > It may not be obvious, but subject already raised on this list, so I'll > try to be as precise as possible (But may be wrong on some points, I'll > let Patrick correct me if necessary) > > ipt_do_table() is not a readonly function returning a verdict. > > 1) It handles a stack (check how is used next->comefrom) that seems to > be stored on rules themselves. (This is how I understand this code) > This is safe as each cpu has its own copy of rules/counters, and BH protected. > > 2) It also updates two 64 bit counters (bytes/packets) on each matched rule. > > 3) Some netfilter matches/targets probably rely on the fact their handlers > are run with BH disabled by their caller (ipt_do_table()/arp/ip6...) > > These must be BH protected (and preempt disabled too), or else : > > 1) A softirq could interrupt a process in the middle of ipt_do_table() > and corrupt its "stack". > > 2) A softirq could interrupt a process in ipt_do_table() in the middle > of the ADD_COUNTER(). Some counters could be corrupted. > > 3) Some netfiler extensions would break. > > Previous linux versions already used a read_lock_bh() here, on a single > and shared rwlock, there is nothing new on this BH locking AFAIK. > > Thank you Thanks for the explanation. It might help to document the role of bh disabling for the reader in a supplementary code comment, otherwise one might think it's been put there to match the bottom half disabling used on the write-side, which has the supplementary role of making sure bh will not deadlock (and this precise behavior is not needed usually on the read-side). One more point : * 1. reading counter values * all readers need to be stopped and the per-CPU values are summed. Maybe it's just me, but this sentence does not seem to clearly indicate that we have : for_each_cpu() write lock() read data write unlock() One might interpret it as : for_each_cpu() write lock() read data for_each_cpu() write unlock() Or maybe it's just my understanding of English that's not perfect. Anyhow, rewording this sentence might not hurt. Something along the lines of : "reading counter values all readers are iteratively stopped to have their per-CPU values summed" This is an important difference, as this behaves more like a RCU-based mechanism than a global per-cpu read/write lock where all the write locks would be taken at once. Mathieu -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html