On Fri, Jan 22, 2021 at 9:55 AM Lukas Wunner <lukas@xxxxxxxxx> wrote: > > sch_handle_egress() returns either the skb or NULL to signal to its > caller __dev_queue_xmit() whether a packet should continue to be > processed. > > The skb is always non-NULL, otherwise __dev_queue_xmit() would hit a > NULL pointer deref right at its top. > > But the compiler doesn't know that. So if sch_handle_egress() signals > success by returning the skb, the "if (!skb) goto out;" statement > results in a gratuitous NULL pointer check in the Assembler output. > > Avoid by telling the compiler that __dev_queue_xmit() is never passed a > NULL skb. This also eliminates another gratuitous NULL pointer check in > __dev_queue_xmit() > qdisc_pkt_len_init() > skb_header_pointer() > __skb_header_pointer() > > The speedup is barely measurable: > Before: 1877 1875 1878 1874 1882 1873 Mb/sec > After: 1877 1877 1880 1883 1888 1886 Mb/sec > > However we're about to add a netfilter egress hook to __dev_queue_xmit() > and without the micro-optimization, it will result in a performance > degradation which is indeed measurable: > With netfilter hook: 1853 1852 1850 1848 1849 1851 Mb/sec > With netfilter hook + micro-optim: 1874 1877 1881 1875 1876 1876 Mb/sec > > The performance degradation is caused by a JNE instruction ("if (skb)") > being flipped to a JE instruction ("if (!skb)") once the netfilter hook > is added. The micro-optimization removes the test and jump instructions > altogether. > > Measurements were performed on a Core i7-3615QM. Reproducer: > ip link add dev foo type dummy > ip link set dev foo up > tc qdisc add dev foo clsact > tc filter add dev foo egress bpf da bytecode '1,6 0 0 0,' > modprobe pktgen > echo "add_device foo" > /proc/net/pktgen/kpktgend_3 > samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -i foo -n 400000000 -m "11:11:11:11:11:11" -d 1.1.1.1 > > Signed-off-by: Lukas Wunner <lukas@xxxxxxxxx> > Cc: John Fastabend <john.fastabend@xxxxxxxxx> > Cc: Daniel Borkmann <daniel@xxxxxxxxxxxxx> > Cc: Alexei Starovoitov <ast@xxxxxxxxxx> > Cc: Eric Dumazet <edumazet@xxxxxxxxxx> > Cc: Thomas Graf <tgraf@xxxxxxx> > --- > net/core/dev.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/net/core/dev.c b/net/core/dev.c > index 7afbb642e203..4c16b9932823 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -4072,6 +4072,7 @@ struct netdev_queue *netdev_core_pick_tx(struct net_device *dev, > * the BH enable code must have IRQs enabled so that it will not deadlock. > * --BLG > */ > +__attribute__((nonnull(1))) > static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev) > { > struct net_device *dev = skb->dev; > -- > 2.29.2 > Interesting ! It is a bit sad the compilers do not automatically get this knowledge from the very first instruction : struct net_device *dev = skb->dev; I see this also makes qdisc_pkt_len_init() slightly faster because this removes the if (!skb) test from __skb_header_pointer() I guess we also could add this patch to benefit all skb_header_pointer() callers : diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 5f60c9e907c9d8eae1e85ae0329838383e3325df..db8774c50cc6ab99deaecdb1dfc31dc5306b9a37 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3627,6 +3627,7 @@ __skb_header_pointer(const struct sk_buff *skb, int offset, return buffer; } +__attribute__((nonnull(1))) static inline void * __must_check skb_header_pointer(const struct sk_buff *skb, int offset, int len, void *buffer) {