Am 09.04.19 um 13:29 schrieb Magnus Karlsson: > On Tue, Apr 9, 2019 at 11:11 AM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote: >> >> On 04/09/2019 08:44 AM, Magnus Karlsson wrote: >>> The use of smp_rmb() and smp_wmb() creates a Linux header dependency >>> on barrier.h that is uneccessary in most parts. This patch implements >>> the two small defines that are needed from barrier.h. As a bonus, the >>> new implementations are faster than the default ones as they default >>> to sfence and lfence for x86, while we only need a compiler barrier in >>> our case. Just as it is when the same ring access code is compiled in >>> the kernel. >>> >>> Fixes: 1cad07884239 ("libbpf: add support for using AF_XDP sockets") >>> Signed-off-by: Magnus Karlsson <magnus.karlsson@xxxxxxxxx> >>> --- >>> tools/lib/bpf/xsk.h | 19 +++++++++++++++++-- >>> 1 file changed, 17 insertions(+), 2 deletions(-) >>> >>> diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h >>> index 3638147..317b44f 100644 >>> --- a/tools/lib/bpf/xsk.h >>> +++ b/tools/lib/bpf/xsk.h >>> @@ -39,6 +39,21 @@ DEFINE_XSK_RING(xsk_ring_cons); >>> struct xsk_umem; >>> struct xsk_socket; >>> >>> +#if !defined bpf_smp_rmb && !defined bpf_smp_wmb >>> +# if defined(__i386__) || defined(__x86_64__) >>> +# define bpf_smp_rmb() asm volatile("" : : : "memory") >>> +# define bpf_smp_wmb() asm volatile("" : : : "memory") >>> +# elif defined(__aarch64__) >>> +# define bpf_smp_rmb() asm volatile("dmb ishld" : : : "memory") >>> +# define bpf_smp_wmb() asm volatile("dmb ishst" : : : "memory") >>> +# elif defined(__arm__) >>> +# define bpf_smp_rmb() asm volatile("dmb ish" : : : "memory") >>> +# define bpf_smp_wmb() asm volatile("dmb ishst" : : : "memory") >>> +# else >>> +# error Architecture not supported by the XDP socket code in libbpf. >>> +# endif >>> +#endif >> >> Hmm, bit unfortunate, but I can see why it's needed inside the header here >> as it's installed into the system and needs to be included by applications. >> Fine with me. In any case, I still think we should also fix it for tooling >> infra to avoid others running into the same issue, lemme resurrect [0] from >> back in the days. >> >> One question though regarding above arm32 implementation. You match these >> barriers against the ones from af_xdp in kernel. So smp_* variants from the >> arch/arm/include/asm/barrier.h header need to be matched. This selects the >> dmb(ish) and dmb(ishst) variant. The implementation of dmb() however is >> quite CPU dependent, the above assumes to match with __LINUX_ARM_ARCH__ >= 7. >> Perhaps this assumption needs at minimum a comment to make it clear for >> developers. > > Good idea. I will put a comment in there to clarify this. > > /Magnus Since this is a userspace library, what about using the C11 atomic_thread_fence() from stdatomic.h - wouldn't that be sufficient? >> [0] https://lore.kernel.org/netdev/20181017144156.16639-2-daniel@xxxxxxxxxxxxx/ >> >>> static inline __u64 *xsk_ring_prod__fill_addr(struct xsk_ring_prod *fill, >>> __u32 idx) >>> { >>> @@ -119,7 +134,7 @@ static inline void xsk_ring_prod__submit(struct xsk_ring_prod *prod, size_t nb) >>> /* Make sure everything has been written to the ring before signalling >>> * this to the kernel. >>> */ >>> - smp_wmb(); >>> + bpf_smp_wmb(); >>> >>> *prod->producer += nb; >>> } >>> @@ -133,7 +148,7 @@ static inline size_t xsk_ring_cons__peek(struct xsk_ring_cons *cons, >>> /* Make sure we do not speculatively read the data before >>> * we have received the packet buffers from the ring. >>> */ >>> - smp_rmb(); >>> + bpf_smp_rmb(); >>> >>> *idx = cons->cached_cons; >>> cons->cached_cons += entries; >>> >>