On Thu, Jul 1, 2021 at 9:20 PM Jay Vosburgh <jay.vosburgh@xxxxxxxxxxxxx> wrote: > > joamaki@xxxxxxxxx wrote: > > >From: Jussi Maki <joamaki@xxxxxxxxx> > > > >This patchset introduces XDP support to the bonding driver. > > > >The motivation for this change is to enable use of bonding (and > >802.3ad) in hairpinning L4 load-balancers such as [1] implemented with > >XDP and also to transparently support bond devices for projects that > >use XDP given most modern NICs have dual port adapters. An alternative > >to this approach would be to implement 802.3ad in user-space and > >implement the bonding load-balancing in the XDP program itself, but > >is rather a cumbersome endeavor in terms of slave device management > >(e.g. by watching netlink) and requires separate programs for native > >vs bond cases for the orchestrator. A native in-kernel implementation > >overcomes these issues and provides more flexibility. > > > >Below are benchmark results done on two machines with 100Gbit > >Intel E810 (ice) NIC and with 32-core 3970X on sending machine, and > >16-core 3950X on receiving machine. 64 byte packets were sent with > >pktgen-dpdk at full rate. Two issues [2, 3] were identified with the > >ice driver, so the tests were performed with iommu=off and patch [2] > >applied. Additionally the bonding round robin algorithm was modified > >to use per-cpu tx counters as high CPU load (50% vs 10%) and high rate > >of cache misses were caused by the shared rr_tx_counter. Fix for this > >has been already merged into net-next. The statistics were collected > >using "sar -n dev -u 1 10". > > > > -----------------------| CPU |--| rxpck/s |--| txpck/s |---- > > without patch (1 dev): > > XDP_DROP: 3.15% 48.6Mpps > > XDP_TX: 3.12% 18.3Mpps 18.3Mpps > > XDP_DROP (RSS): 9.47% 116.5Mpps > > XDP_TX (RSS): 9.67% 25.3Mpps 24.2Mpps > > ----------------------- > > with patch, bond (1 dev): > > XDP_DROP: 3.14% 46.7Mpps > > XDP_TX: 3.15% 13.9Mpps 13.9Mpps > > XDP_DROP (RSS): 10.33% 117.2Mpps > > XDP_TX (RSS): 10.64% 25.1Mpps 24.0Mpps > > ----------------------- > > with patch, bond (2 devs): > > XDP_DROP: 6.27% 92.7Mpps > > XDP_TX: 6.26% 17.6Mpps 17.5Mpps > > XDP_DROP (RSS): 11.38% 117.2Mpps > > XDP_TX (RSS): 14.30% 28.7Mpps 27.4Mpps > > -------------------------------------------------------------- > > To be clear, the fact that the performance numbers for XDP_DROP > and XDP_TX are lower for "with patch, bond (1 dev)" than "without patch > (1 dev)" is expected, correct? Yes that is correct. With the patch the ndo callback for choosing the slave device is invoked which in this test (mode=xor) hashes L2&L3 headers (I seem to have failed to mention this in the original message). In round-robin mode I recall it being about 16Mpps versus the 18Mpps without the patch. I did also try "INDIRECT_CALL" to avoid going via ndo_ops, but that had no discernible effect.