Hi Pablo, This should obviously have been for nf-next, and I also forgot to cc netfilter-devel@xxxxxxxxxxxxxxx ... do you want me to repost? --Jesper On Thu, 27 Feb 2014 17:41:10 +0100 Jesper Dangaard Brouer <brouer@xxxxxxxxxx> wrote: > This patchset change the conntrack locking and provides a huge > performance improvements. > > This patchset is based upon Eric Dumazet's proposed patch: > http://thread.gmane.org/gmane.linux.network/268758/focus=47306 > I have in agreement with Eric Dumazet, taken over this patch (and > turned it into a entire patchset). > > Primary focus is to remove the central spinlock nf_conntrack_lock. > This requires several steps to be acheived. > > Patch01: Trivial cleanups > > Patch02: Moves the "special" dying/unconfirmed/template lists to use a > per cpu spinlock. > > Patch03: Is preparing for patch04, as it address a race > condition. Doing this a seperate patch for reviewers sake. > > Patch04: Seperates expect locking from nf_conntrack_lock. The expect > list is small (default max 256), this it just get a single lock. > > Patch05: Finally can remove nf_conntrack_lock, and instead uses an > array of hashed spinlocks to protect insertions/deletions of > conntracks into the hash table. While still allowing dynamic > resizing of the hash table. > > > Testing > ------- > For expectations I've mostly tested the FTP nf_conntrack_ftp > helper module, by commands: > > for x in `seq 1 300`; do \ > echo $x; \ > echo -e "USER anonymous\nPASS nothing\nPASV" | nc 192.168.42.129 21; \ > done > > wget ftp://192.168.42.129/pub/delete.me.4k -O /dev/null > > For overload/DoS testing, I've primarily done, SYN-flood attack testing. > Results on a 24-core E5-2695v2(ES) with 10Gbit/s ixgbe (with tool trafgen) > > Base kernel : New 810.405 conntrack/sec > Fixed kernel: New 2.233.876 conntrack/sec > > Notice other floods attack (SYN+ACK or ACK) can easily be deflected using: > # iptables -A INPUT -m state --state INVALID -j DROP > # sysctl -w net/netfilter/nf_conntrack_tcp_loose=0 > > E.g. this machine can reflect 6.481.463 "invalid" conntrack/sec (from > an ACK-flood). > > Perf data: > ---------- > The nf_conntrack_lock is suffers from huge contention on current > generation servers (8 or more core/threads). Data from under > SYN-flooding (without a listen socket) > > Perf locking congestion is very "visible" on a base kernel: > > - 72.56% ksoftirqd/6 [kernel.kallsyms] [k] _raw_spin_lock_bh > - _raw_spin_lock_bh > + 25.33% init_conntrack > + 24.86% nf_ct_delete_from_lists > + 24.62% __nf_conntrack_confirm > + 24.38% destroy_conntrack > + 0.70% tcp_packet > + 2.21% ksoftirqd/6 [kernel.kallsyms] [k] fib_table_lookup > + 1.15% ksoftirqd/6 [kernel.kallsyms] [k] __slab_free > + 0.77% ksoftirqd/6 [kernel.kallsyms] [k] inet_getpeer > + 0.70% ksoftirqd/6 [nf_conntrack] [k] nf_ct_delete > + 0.55% ksoftirqd/6 [ip_tables] [k] ipt_do_table > > Perf after the patchset (SYN-flood attack): > > + 9.62% ksoftirqd/6 [kernel.kallsyms] [k] fib_table_lookup > + 3.78% ksoftirqd/6 [kernel.kallsyms] [k] __slab_free > + 2.71% ksoftirqd/6 [kernel.kallsyms] [k] inet_getpeer > + 2.55% ksoftirqd/6 [kernel.kallsyms] [k] check_leaf > + 2.38% ksoftirqd/6 [ip_tables] [k] ipt_do_table > + 2.06% ksoftirqd/6 [kernel.kallsyms] [k] __slab_alloc > + 1.94% ksoftirqd/6 [nf_conntrack] [k] __nf_conntrack_alloc > - 1.94% ksoftirqd/6 [kernel.kallsyms] [k] _raw_spin_lock > - _raw_spin_lock > + 90.32% nf_conntrack_double_lock > + 3.61% get_partial_node > + 1.81% nf_ct_delete_from_lists > + 1.68% __nf_conntrack_confirm > + 1.03% sch_direct_xmit > + 0.52% scheduler_tick > + 1.86% ksoftirqd/6 [kernel.kallsyms] [k] nf_iterate > + 1.80% ksoftirqd/6 [nf_conntrack] [k] init_conntrack > + 1.77% ksoftirqd/6 [kernel.kallsyms] [k] __neigh_event_send > - 1.70% ksoftirqd/6 [kernel.kallsyms] [k] _raw_spin_lock_bh > - _raw_spin_lock_bh > + 32.55% nf_ct_del_from_dying_or_unconfirmed_list > + 25.33% init_conntrack > + 19.88% tcp_packet > + 17.97% nf_ct_delete_from_lists > + 1.62% nf_conntrack_in > + 1.33% ixgbe_poll > + 0.74% destroy_conntrack > + 1.64% ksoftirqd/6 [nf_conntrack] [k] hash_conntrack_raw > + 1.58% ksoftirqd/6 [kernel.kallsyms] [k] __netif_receive_skb_core > + 1.51% ksoftirqd/6 [nf_conntrack] [k] __nf_conntrack_find_get > + 1.48% ksoftirqd/6 [kernel.kallsyms] [k] __cmpxchg_double_slab > + 1.46% ksoftirqd/6 [nf_conntrack] [k] nf_conntrack_in > + 1.45% ksoftirqd/6 [kernel.kallsyms] [k] __local_bh_enable_ip > > > --- > > Jesper Dangaard Brouer (5): > netfilter: conntrack: remove central spinlock nf_conntrack_lock > netfilter: conntrack: seperate expect locking from nf_conntrack_lock > netfilter: avoid race with exp->master ct > netfilter: conntrack: spinlock per cpu to protect special lists. > netfilter: trivial code cleanup and doc changes > > > include/net/netfilter/nf_conntrack.h | 11 + > include/net/netfilter/nf_conntrack_core.h | 9 + > include/net/netns/conntrack.h | 13 + > net/netfilter/nf_conntrack_core.c | 427 ++++++++++++++++++++--------- > net/netfilter/nf_conntrack_expect.c | 36 ++ > net/netfilter/nf_conntrack_h323_main.c | 4 > net/netfilter/nf_conntrack_helper.c | 37 ++- > net/netfilter/nf_conntrack_netlink.c | 128 +++++---- > net/netfilter/nf_conntrack_sip.c | 8 - > 9 files changed, 456 insertions(+), 217 deletions(-) > -- Best regards, Jesper Dangaard Brouer MSc.CS, Sr. Network Kernel Developer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html