sch_handle_egress() returns either the skb or NULL to signal to its caller __dev_queue_xmit() whether a packet should continue to be processed. The skb is always non-NULL, otherwise __dev_queue_xmit() would hit a NULL pointer deref right at its top. But the compiler doesn't know that. So if sch_handle_egress() signals success by returning the skb, the "if (!skb) goto out;" statement results in a gratuitous NULL pointer check in the Assembler output. Avoid by telling the compiler that __dev_queue_xmit() is never passed a NULL skb. This also eliminates another gratuitous NULL pointer check in __dev_queue_xmit() qdisc_pkt_len_init() skb_header_pointer() __skb_header_pointer() The speedup is barely measurable: Before: 1877 1875 1878 1874 1882 1873 Mb/sec After: 1877 1877 1880 1883 1888 1886 Mb/sec However we're about to add a netfilter egress hook to __dev_queue_xmit() and without the micro-optimization, it will result in a performance degradation which is indeed measurable: With netfilter hook: 1853 1852 1850 1848 1849 1851 Mb/sec With netfilter hook + micro-optim: 1874 1877 1881 1875 1876 1876 Mb/sec The performance degradation is caused by a JNE instruction ("if (skb)") being flipped to a JE instruction ("if (!skb)") once the netfilter hook is added. The micro-optimization removes the test and jump instructions altogether. Measurements were performed on a Core i7-3615QM. Reproducer: ip link add dev foo type dummy ip link set dev foo up tc qdisc add dev foo clsact tc filter add dev foo egress bpf da bytecode '1,6 0 0 0,' modprobe pktgen echo "add_device foo" > /proc/net/pktgen/kpktgend_3 samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -i foo -n 400000000 -m "11:11:11:11:11:11" -d 1.1.1.1 Signed-off-by: Lukas Wunner <lukas@xxxxxxxxx> Cc: John Fastabend <john.fastabend@xxxxxxxxx> Cc: Daniel Borkmann <daniel@xxxxxxxxxxxxx> Cc: Alexei Starovoitov <ast@xxxxxxxxxx> Cc: Eric Dumazet <edumazet@xxxxxxxxxx> Cc: Thomas Graf <tgraf@xxxxxxx> --- net/core/dev.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/core/dev.c b/net/core/dev.c index 7afbb642e203..4c16b9932823 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4072,6 +4072,7 @@ struct netdev_queue *netdev_core_pick_tx(struct net_device *dev, * the BH enable code must have IRQs enabled so that it will not deadlock. * --BLG */ +__attribute__((nonnull(1))) static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev) { struct net_device *dev = skb->dev; -- 2.29.2