Stephen Hemminger a écrit : > This version of x_tables (ip/ip6/arp) locking uses a per-cpu > recursive lock that can be nested. It is sort of like existing kernel_lock, > rwlock_t and even old 2.4 brlock. > > "Reader" is ip/arp/ip6 tables rule processing which runs per-cpu. > It needs to ensure that the rules are not being changed while packet > is being processed. > > "Writer" is used in two cases: first is replacing rules in which case > all packets in flight have to be processed before rules are swapped, > then counters are read from the old (stale) info. Second case is where > counters need to be read on the fly, in this case all CPU's are blocked > from further rule processing until values are aggregated. > > The idea for this came from an earlier version done by Eric Dumazet. > Locking is done per-cpu, the fast path locks on the current cpu > and updates counters. This reduces the contention of a > single reader lock (in 2.6.29) without the delay of synchronize_net() > (in 2.6.30-rc2). > > The mutex that was added for 2.6.30 in xt_table is unnecessary since > there already is a mutex for xt[af].mutex that is held. > > Signed-off-by: Stephen Hemminger <shemminger@xxxxxxxxxx I reviewed this patch believe its in quite good shape, thanks Stephen. Then I tested it on a x86_32 8 cpus machine and got no obvious problem. Signed-off-by: Eric Dumazet <dada1@xxxxxxxxxxxxx> Hopefully, next rcu_bh (or whatever name is used) will permit us to switch back to pure RCU in 2.6.31 oprofile snapshot of a tbench session, with light iptables rules. (4 rules in INPUT chain, 3 rules on OUTPUT) xt_info_rdlock_bh() uses 0.6786 % of cpu xt_info_rdunlock_bh() uses 0.1743 % of cpu CPU: Core 2, speed 3000.77 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000 samples cum. samples % cum. % symbol name 1248350 1248350 11.3285 11.3285 copy_from_user 534049 1782399 4.8464 16.1749 copy_to_user 480898 2263297 4.3641 20.5390 __schedule 325581 2588878 2.9546 23.4936 ipt_do_table 312697 2901575 2.8377 26.3312 tcp_ack 309381 3210956 2.8076 29.1388 tcp_sendmsg 248238 3459194 2.2527 31.3915 tcp_v4_rcv 230405 3689599 2.0909 33.4824 tcp_transmit_skb 220638 3910237 2.0022 35.4847 ip_queue_xmit 217099 4127336 1.9701 37.4548 tcp_recvmsg 175885 4303221 1.5961 39.0509 tcp_rcv_established 173112 4476333 1.5710 40.6219 __switch_to 165138 4641471 1.4986 42.1205 sysenter_past_esp 149367 4790838 1.3555 43.4759 dst_release 138619 4929457 1.2579 44.7339 sched_clock_cpu 132724 5062181 1.2044 45.9383 lock_sock_nested 121353 5183534 1.1013 47.0396 nf_iterate 119205 5302739 1.0818 48.1214 netif_receive_skb 118859 5421598 1.0786 49.2000 release_sock 112597 5534195 1.0218 50.2218 __inet_lookup_established 112195 5646390 1.0181 51.2399 sys_socketcall 110018 5756408 0.9984 52.2383 tcp_write_xmit 106466 5862874 0.9662 53.2045 __alloc_skb 93386 5956260 0.8475 54.0519 dev_queue_xmit 89229 6045489 0.8097 54.8617 tcp_event_data_recv 85972 6131461 0.7802 55.6418 local_bh_enable 82882 6214343 0.7521 56.3940 skb_release_data 80898 6295241 0.7341 57.1281 ip_rcv 76380 6371621 0.6931 57.8213 skb_copy_datagram_iovec 74782 6446403 0.6786 58.4999 xt_info_rdlock_bh 73593 6519996 0.6678 59.1677 mod_timer 72884 6592880 0.6614 59.8291 sock_recvmsg 71789 6664669 0.6515 60.4806 __copy_skb_header 70560 6735229 0.6403 61.1209 fget_light 68756 6803985 0.6239 61.7449 get_page_from_freelist 68378 6872363 0.6205 62.3654 put_page 68042 6940405 0.6175 62.9829 ip_finish_output 67618 7008023 0.6136 63.5965 page_address 64894 7072917 0.5889 64.1854 tcp_cleanup_rbuf > > --- > CHANGES > - optimize for UP > - disable bottom half in info_rdlock > - prevent preempt count overflow > - turn off lockdep in writer to avoid bogus warning > - optimize unlock_bh > > -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html