Patrick McHardy wrote:
Fabian Hugelshofer wrote:Again most of the time is spent in the kernel. Memory and skb operations are accounted there. I suspect that they cause the most overhead.Do you plan to dig deeper into optimising the non-optimal parts? I consider myself not to have enough understanding to do it myself.The first thing to try would be to use sane allocation sizes for the event messages. This patch doesn't implement it properly (uses probing), but should be enough to test whether it helps.
Thanks a lot. This patch already decreased the CPU usage for ctevtest from 85% to 44%. Sweet...
I created a new callgraph profile which you find attached to this mail. Let's have a look at two parts:
First: 2055 2.7205 ctnetlink_conntrack_event 2378 21.6201 nla_put 2181 19.8291 nfnetlink_send 2055 18.6835 ctnetlink_conntrack_event [self] 1250 11.3647 __alloc_skb 955 8.6826 ipv4_tuple_to_nlattr 752 6.8370 nf_ct_port_tuple_to_nlattr 321 2.9184 __memzero 220 2.0002 nfnetlink_has_listeners 177 1.6092 nf_ct_l4proto_find_get 155 1.4092 __nla_put 116 1.0546 nf_ct_l3proto_find_get 82 0.7455 module_put 70 0.6364 nf_ct_l4proto_put 66 0.6001 nf_ct_l3proto_put 60 0.5455 nlmsg_notify 43 0.3909 netlink_has_listeners 42 0.3819 __kmalloc 37 0.3364 kmem_cache_alloc 26 0.2364 __nf_ct_l4proto_find 13 0.1182 __irq_svcnf_conntrack_event is now one of the first functions listed. Do you see other ways of improving performance?
Second: 33 2.4775 __nf_ct_ext_add 63 4.7297 dev_hard_start_xmit 65 4.8799 sock_recvmsg 77 5.7808 netif_receive_skb 92 6.9069 __nla_put 96 7.2072 nf_conntrack_alloc 199 14.9399 nf_conntrack_in 246 18.4685 skb_copy 427 32.0571 nf_ct_invert_tuplepr 1793 2.3737 __memzero 1793 100.000 __memzero [self]Is the zeroing of the inverted tuple in nf_ct_invert_tuple really required? As far as I can see all fields are set by the subsequent code.
Attachment:
opreport_cg_patch.tar.gz
Description: GNU Zip compressed data