XDP_TX is similar to XDP_REDIRECT as it essentially redirects packets to the device itself. XDP_REDIRECT has bulk transmit mechanism to avoid the heavy cost of indirect call but it also reduces lock acquisition on the destination device that needs locks like veth and tun. XDP_TX does not use indirect calls but drivers which require locks can benefit from the bulk transmit for XDP_TX as well. This patch adds per-cpu queues which can be used for bulk transmit on XDP_TX. I did not add functions like enqueue/flush but exposed the queue directly because we should avoid indirect calls on XDP_TX. Note that the queue must be flushed, i.e. "count" member needs to be set to 0, when a NAPI handler which used this queue exits. Otherwise packets left in the queue will be transmitted from totally unintentional devices. Signed-off-by: Toshiaki Makita <makita.toshiaki@xxxxxxxxxxxxx> --- include/net/xdp.h | 7 +++++++ net/core/xdp.c | 3 +++ 2 files changed, 10 insertions(+) diff --git a/include/net/xdp.h b/include/net/xdp.h index 0f25b36..30b36c8 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -84,6 +84,13 @@ struct xdp_frame { struct net_device *dev_rx; /* used by cpumap */ }; +#define XDP_TX_BULK_SIZE 16 +struct xdp_tx_bulk_queue { + struct xdp_frame *q[XDP_TX_BULK_SIZE]; + unsigned int count; +}; +DECLARE_PER_CPU(struct xdp_tx_bulk_queue, xdp_tx_bq); + /* Clear kernel pointers in xdp_frame */ static inline void xdp_scrub_frame(struct xdp_frame *frame) { diff --git a/net/core/xdp.c b/net/core/xdp.c index 4b2b194..0622f2d 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -40,6 +40,9 @@ struct xdp_mem_allocator { struct rcu_head rcu; }; +DEFINE_PER_CPU(struct xdp_tx_bulk_queue, xdp_tx_bq); +EXPORT_PER_CPU_SYMBOL_GPL(xdp_tx_bq); + static u32 xdp_mem_id_hashfn(const void *data, u32 len, u32 seed) { const u32 *k = data; -- 1.8.3.1