On Tue, Jan 21, 2020 at 12:50:23PM +0100, Björn Töpel wrote: > On Tue, 21 Jan 2020 at 00:51, Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote: > > > > On 1/20/20 10:21 AM, Björn Töpel wrote: > > > From: Björn Töpel <bjorn.topel@xxxxxxxxx> > > > > > > Currently, the AF_XDP rings uses fences for the kernel-side > > > produce/consume functions. By updating rings for > > > load-acquire/store-release semantics, the full barrier (smp_mb()) on > > > the consumer side can be replaced. > > > > > > Signed-off-by: Björn Töpel <bjorn.topel@xxxxxxxxx> > > > > If I'm not missing something from the ring update scheme, don't you also need > > to adapt to STORE.rel ->producer with matching barrier in tools/lib/bpf/xsk.h ? > > > > Daniel/John, > > Hmm, I was under the impression that *wasn't* the case. Quoting > memory-barriers.txt: > > --8<-- > When dealing with CPU-CPU interactions, certain types of memory > barrier should always be paired. A lack of appropriate pairing is > almost certainly an error. > > General barriers pair with each other, though they also pair with most > other types of barriers, albeit without multicopy atomicity. An > acquire barrier pairs with a release barrier, but both may also pair > with other barriers, including of course general barriers. A write > barrier pairs with a data dependency barrier, a control dependency, an > acquire barrier, a release barrier, a read barrier, or a general > barrier. Similarly a read barrier, control dependency, or a data > dependency barrier pairs with a write barrier, an acquire barrier, a > release barrier, or a general barrier: > -->8-- The key part here is "albeit without multicopy atomicity". I don't think you care about that at all for these rings as you're very clearly passing a message from the producer side to the consumer side in a point-to-point like manner, so I think you're ok to change the kernel independently from userspace (but I would still recommend updating both eventually). The only thing you might run into is if anybody is relying on the smp_mb() in the consumer to order other unrelated stuff either side of the consume operation (or even another consume operation to a different ring!), but it looks like you can't rely on that in the xsk queue implementation anyway because you cache the global state and so the barriers are conditional. Will