Re: [PATCH bpf v2 2/2] libbpf: remove dependency on barrier.h in xsk.h

Y Song <ys114321@xxxxxxxxx> · Wed, 10 Apr 2019 11:13:40 -0700

On Wed, Apr 10, 2019 at 12:21 AM Magnus Karlsson
<magnus.karlsson@xxxxxxxxx> wrote:
>
> The use of smp_rmb() and smp_wmb() creates a Linux header dependency
> on barrier.h that is uneccessary in most parts. This patch implements
> the two small defines that are needed from barrier.h. As a bonus, the
> new implementations are faster than the default ones as they default
> to sfence and lfence for x86, while we only need a compiler barrier in
> our case. Just as it is when the same ring access code is compiled in
> the kernel.
>
> Fixes: 1cad07884239 ("libbpf: add support for using AF_XDP sockets")
> Signed-off-by: Magnus Karlsson <magnus.karlsson@xxxxxxxxx>
> ---
>  tools/lib/bpf/xsk.h | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
> index 3638147..69136d9 100644
> --- a/tools/lib/bpf/xsk.h
> +++ b/tools/lib/bpf/xsk.h
> @@ -39,6 +39,22 @@ DEFINE_XSK_RING(xsk_ring_cons);
>  struct xsk_umem;
>  struct xsk_socket;
>
> +#if !defined bpf_smp_rmb && !defined bpf_smp_wmb

Maybe add some comments to explain the different between bpf_smp_{r,w}mb
and smp_{r,w}mb so later users will have a better idea which to pick?

> +# if defined(__i386__) || defined(__x86_64__)
> +#  define bpf_smp_rmb() asm volatile("" : : : "memory")
> +#  define bpf_smp_wmb() asm volatile("" : : : "memory")
> +# elif defined(__aarch64__)
> +#  define bpf_smp_rmb() asm volatile("dmb ishld" : : : "memory")
> +#  define bpf_smp_wmb() asm volatile("dmb ishst" : : : "memory")
> +# elif defined(__arm__)
> +/* These are only valid for armv7 and above */
> +#  define bpf_smp_rmb() asm volatile("dmb ish" : : : "memory")
> +#  define bpf_smp_wmb() asm volatile("dmb ishst" : : : "memory")
> +# else
> +#  error Architecture not supported by the XDP socket code in libbpf.
> +# endif
> +#endif

Since this is generic enough and could be used by other files as well,
maybe put it into libbpf_util.h?

> +
>  static inline __u64 *xsk_ring_prod__fill_addr(struct xsk_ring_prod *fill,
>                                               __u32 idx)
>  {
> @@ -119,7 +135,7 @@ static inline void xsk_ring_prod__submit(struct xsk_ring_prod *prod, size_t nb)
>         /* Make sure everything has been written to the ring before signalling
>          * this to the kernel.
>          */
> -       smp_wmb();
> +       bpf_smp_wmb();
>
>         *prod->producer += nb;
>  }
> @@ -133,7 +149,7 @@ static inline size_t xsk_ring_cons__peek(struct xsk_ring_cons *cons,
>                 /* Make sure we do not speculatively read the data before
>                  * we have received the packet buffers from the ring.
>                  */
> -               smp_rmb();
> +               bpf_smp_rmb();

Could you explain why a compiler barrier is good enough here on x86? Note that
the load cons->cached_cons could be reordered with earlier
non-overlapping stores
at runtime.

>
>                 *idx = cons->cached_cons;
>                 cons->cached_cons += entries;
> --
> 2.7.4
>