On Thu, 28 Jan 2021 20:51:23 -0700 David Ahern <dsahern@xxxxxxxxx> wrote: > On 1/26/21 10:39 AM, Jesper Dangaard Brouer wrote: > > The current layout of net_device is not optimal for cacheline usage. > > > > The member adj_list.lower linked list is split between cacheline 2 and 3. > > The ifindex is placed together with stats (struct net_device_stats), > > although most modern drivers don't update this stats member. > > > > The members netdev_ops, mtu and hard_header_len are placed on three > > different cachelines. These members are accessed for XDP redirect into > > devmap, which were noticeably with perf tool. When not using the map > > redirect variant (like TC-BPF does), then ifindex is also used, which is > > placed on a separate fourth cacheline. These members are also accessed > > during forwarding with regular network stack. The members priv_flags and > > flags are on fast-path for network stack transmit path in __dev_queue_xmit > > (currently located together with mtu cacheline). > > > > This patch creates a read mostly cacheline, with the purpose of keeping the > > above mentioned members on the same cacheline. > > > > Some netdev_features_t members also becomes part of this cacheline, which is > > on purpose, as function netif_skb_features() is on fast-path via > > validate_xmit_skb(). > > A long over due look at the organization of this struct. Do you have > performance numbers for the XDP case? Yes, my measurements are documented here: https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp_redir01_net_device.org Calc improvements of xdp_redirect_map on driver i40e: * (1/12115061-1/12906785)*10^9 = 5.06 ns * ((12906785/12115061)-1)*100 = 6.54% -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer