On Wed, May 29, 2019 at 05:06:40PM +0100, David Howells wrote: > Looking at the perf ring buffer, there appears to be a missing barrier in > perf_aux_output_end(): > > rb->user_page->aux_head = rb->aux_head; > > should be: > > smp_store_release(&rb->user_page->aux_head, rb->aux_head); I've answered that in another email; the aux bit is 'magic'. > It should also be using smp_load_acquire(). See > Documentation/core-api/circular-buffers.rst We use the control dependency instead, as described in the comment of perf_output_put_handle(): * kernel user * * if (LOAD ->data_tail) { LOAD ->data_head * (A) smp_rmb() (C) * STORE $data LOAD $data * smp_wmb() (B) smp_mb() (D) * STORE ->data_head STORE ->data_tail * } * * Where A pairs with D, and B pairs with C. * * In our case (A) is a control dependency that separates the load of * the ->data_tail and the stores of $data. In case ->data_tail * indicates there is no room in the buffer to store $data we do not. * * D needs to be a full barrier since it separates the data READ * from the tail WRITE. * * For B a WMB is sufficient since it separates two WRITEs, and for C * an RMB is sufficient since it separates two READs. Userspace can choose to use smp_load_acquire() over the first smp_rmb() if that is efficient for the architecture (for w ahole bunch of archs load-acquire would end up using mb() while rmb() is adequate and cheaper).