Re: [PATCH bpf-next 1/2] xsk: update rings for load-acquire/store-release semantics

Toke Høiland-Jørgensen <toke@xxxxxxxxxx> · Mon, 01 Mar 2021 17:08:43 +0100

Björn Töpel <bjorn.topel@xxxxxxxxx> writes:

> From: Björn Töpel <bjorn.topel@xxxxxxxxx>
>
> Currently, the AF_XDP rings uses smp_{r,w,}mb() fences on the
> kernel-side. By updating the rings for load-acquire/store-release
> semantics, the full barrier on the consumer side can be replaced with
> improved performance as a nice side-effect.
>
> Note that this change does *not* require similar changes on the
> libbpf/userland side, however it is recommended [1].
>
> On x86-64 systems, by removing the smp_mb() on the Rx and Tx side, the
> l2fwd AF_XDP xdpsock sample performance increases by
> 1%. Weakly-ordered platforms, such as ARM64 might benefit even more.
>
> [1] https://lore.kernel.org/bpf/20200316184423.GA14143@willie-the-truck/
>
> Signed-off-by: Björn Töpel <bjorn.topel@xxxxxxxxx>
> ---
>  net/xdp/xsk_queue.h | 27 +++++++++++----------------
>  1 file changed, 11 insertions(+), 16 deletions(-)
>
> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> index 2823b7c3302d..e24279d8d845 100644
> --- a/net/xdp/xsk_queue.h
> +++ b/net/xdp/xsk_queue.h
> @@ -47,19 +47,18 @@ struct xsk_queue {
>  	u64 queue_empty_descs;
>  };
>  
> -/* The structure of the shared state of the rings are the same as the
> - * ring buffer in kernel/events/ring_buffer.c. For the Rx and completion
> - * ring, the kernel is the producer and user space is the consumer. For
> - * the Tx and fill rings, the kernel is the consumer and user space is
> - * the producer.
> +/* The structure of the shared state of the rings are a simple
> + * circular buffer, as outlined in
> + * Documentation/core-api/circular-buffers.rst. For the Rx and
> + * completion ring, the kernel is the producer and user space is the
> + * consumer. For the Tx and fill rings, the kernel is the consumer and
> + * user space is the producer.
>   *
>   * producer                         consumer
>   *
> - * if (LOAD ->consumer) {           LOAD ->producer
> - *                    (A)           smp_rmb()       (C)
> + * if (LOAD ->consumer) {  (A)      LOAD.acq ->producer  (C)

Why is LOAD.acq not needed on the consumer side?

-Toke