On Thu, Sep 19, 2019 at 02:59:06PM +0100, David Howells wrote: > But I don't agree with this. You're missing half the barriers. There should > be *four* barriers. The document mandates only 3 barriers, and uses > READ_ONCE() where the fourth should be, i.e.: > > thread #1 thread #2 > > smp_load_acquire(head) > ... read data from queue .. > smp_store_release(tail) > > READ_ONCE(tail) > ... add data to queue .. > smp_store_release(head) > Notably your READ_ONCE() pseudo code is lacking a conditional; kernel/events/ring_buffer.c writes it like so: * kernel user * * if (LOAD ->data_tail) { LOAD ->data_head * (A) smp_rmb() (C) * STORE $data LOAD $data * smp_wmb() (B) smp_mb() (D) * STORE ->data_head STORE ->data_tail * } * * Where A pairs with D, and B pairs with C. * * In our case (A) is a control dependency that separates the load of * the ->data_tail and the stores of $data. In case ->data_tail * indicates there is no room in the buffer to store $data we do not. * * D needs to be a full barrier since it separates the data READ * from the tail WRITE. * * For B a WMB is sufficient since it separates two WRITEs, and for C * an RMB is sufficient since it separates two READs. Where 'kernel' is the producer and 'user' is the consumer. This was written before load-acquire and store-release came about (I _think_), and I've so far resisted updating B to store-release because smp_wmb() is actually cheaper than store-release on a number of architectures (notably ARM). C ought to be a load-aquire, and D really should be a store-release, but I don't think the perf userspace has that (or uses C11).