On Wed, Apr 10, 2019 at 12:21 AM Magnus Karlsson <magnus.karlsson@xxxxxxxxx> wrote: > > The use of smp_rmb() and smp_wmb() creates a Linux header dependency > on barrier.h that is uneccessary in most parts. This patch implements > the two small defines that are needed from barrier.h. As a bonus, the > new implementations are faster than the default ones as they default > to sfence and lfence for x86, while we only need a compiler barrier in > our case. Just as it is when the same ring access code is compiled in > the kernel. > > Fixes: 1cad07884239 ("libbpf: add support for using AF_XDP sockets") > Signed-off-by: Magnus Karlsson <magnus.karlsson@xxxxxxxxx> > --- > tools/lib/bpf/xsk.h | 20 ++++++++++++++++++-- > 1 file changed, 18 insertions(+), 2 deletions(-) > > diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h > index 3638147..69136d9 100644 > --- a/tools/lib/bpf/xsk.h > +++ b/tools/lib/bpf/xsk.h > @@ -39,6 +39,22 @@ DEFINE_XSK_RING(xsk_ring_cons); > struct xsk_umem; > struct xsk_socket; > > +#if !defined bpf_smp_rmb && !defined bpf_smp_wmb Maybe add some comments to explain the different between bpf_smp_{r,w}mb and smp_{r,w}mb so later users will have a better idea which to pick? > +# if defined(__i386__) || defined(__x86_64__) > +# define bpf_smp_rmb() asm volatile("" : : : "memory") > +# define bpf_smp_wmb() asm volatile("" : : : "memory") > +# elif defined(__aarch64__) > +# define bpf_smp_rmb() asm volatile("dmb ishld" : : : "memory") > +# define bpf_smp_wmb() asm volatile("dmb ishst" : : : "memory") > +# elif defined(__arm__) > +/* These are only valid for armv7 and above */ > +# define bpf_smp_rmb() asm volatile("dmb ish" : : : "memory") > +# define bpf_smp_wmb() asm volatile("dmb ishst" : : : "memory") > +# else > +# error Architecture not supported by the XDP socket code in libbpf. > +# endif > +#endif Since this is generic enough and could be used by other files as well, maybe put it into libbpf_util.h? > + > static inline __u64 *xsk_ring_prod__fill_addr(struct xsk_ring_prod *fill, > __u32 idx) > { > @@ -119,7 +135,7 @@ static inline void xsk_ring_prod__submit(struct xsk_ring_prod *prod, size_t nb) > /* Make sure everything has been written to the ring before signalling > * this to the kernel. > */ > - smp_wmb(); > + bpf_smp_wmb(); > > *prod->producer += nb; > } > @@ -133,7 +149,7 @@ static inline size_t xsk_ring_cons__peek(struct xsk_ring_cons *cons, > /* Make sure we do not speculatively read the data before > * we have received the packet buffers from the ring. > */ > - smp_rmb(); > + bpf_smp_rmb(); Could you explain why a compiler barrier is good enough here on x86? Note that the load cons->cached_cons could be reordered with earlier non-overlapping stores at runtime. > > *idx = cons->cached_cons; > cons->cached_cons += entries; > -- > 2.7.4 >