On Mon, Sep 23, 2019 at 04:49:31PM +0200, Peter Zijlstra wrote: > On Thu, Sep 19, 2019 at 02:59:06PM +0100, David Howells wrote: > > > But I don't agree with this. You're missing half the barriers. There should > > be *four* barriers. The document mandates only 3 barriers, and uses > > READ_ONCE() where the fourth should be, i.e.: > > > > thread #1 thread #2 > > > > smp_load_acquire(head) > > ... read data from queue .. > > smp_store_release(tail) > > > > READ_ONCE(tail) > > ... add data to queue .. > > smp_store_release(head) > > > > Notably your READ_ONCE() pseudo code is lacking a conditional; > kernel/events/ring_buffer.c writes it like so: > > * kernel user > * > * if (LOAD ->data_tail) { LOAD ->data_head > * (A) smp_rmb() (C) > * STORE $data LOAD $data > * smp_wmb() (B) smp_mb() (D) > * STORE ->data_head STORE ->data_tail > * } > * > * Where A pairs with D, and B pairs with C. > * > * In our case (A) is a control dependency that separates the load of > * the ->data_tail and the stores of $data. In case ->data_tail > * indicates there is no room in the buffer to store $data we do not. To elaborate on this, dependencies are tricky... ;-) For the record, the LKMM doesn't currently model "order" derived from control dependencies to a _plain_ access (even if the plain access is a write): in particular, the following is racy (as far as the current LKMM is concerned): C rb { } P0(int *tail, int *data, int *head) { if (READ_ONCE(*tail)) { *data = 1; smp_wmb(); WRITE_ONCE(*head, 1); } } P1(int *tail, int *data, int *head) { int r0; int r1; r0 = READ_ONCE(*head); smp_rmb(); r1 = *data; smp_mb(); WRITE_ONCE(*tail, 1); } Replacing the plain "*data = 1" with "WRITE_ONCE(*data, 1)" (or doing s/READ_ONCE(*tail)/smp_load_acquire(tail)) suffices to avoid the race. Maybe I'm short of imagination this morning... but I can't currently see how the compiler could "break" the above scenario. I also didn't spend much time thinking about it. memory-barriers.txt has a section "CONTROL DEPENDENCIES" dedicated to "alerting developers using control dependencies for ordering". That's quite a long section (and probably still incomplete); the last paragraph summarizes: ;-) (*) Compilers do not understand control dependencies. It is therefore your job to ensure that they do not break your code. Andrea > * > * D needs to be a full barrier since it separates the data READ > * from the tail WRITE. > * > * For B a WMB is sufficient since it separates two WRITEs, and for C > * an RMB is sufficient since it separates two READs. > > Where 'kernel' is the producer and 'user' is the consumer. This was > written before load-acquire and store-release came about (I _think_), > and I've so far resisted updating B to store-release because smp_wmb() > is actually cheaper than store-release on a number of architectures > (notably ARM). > > C ought to be a load-aquire, and D really should be a store-release, but > I don't think the perf userspace has that (or uses C11).