On Tue, May 25, 2021 at 06:12:46PM -0500, Pearson, Robert B wrote: > > On 5/25/2021 6:03 PM, Jason Gunthorpe wrote: > > On Tue, May 25, 2021 at 03:11:12PM -0500, Bob Pearson wrote: > > > The rxe driver has recently begun exhibiting failures in the python > > > tests that are due to stale values read from the completion queue > > > producer or consumer indices. Unlike the other loads of these shared > > > indices those in queue_count() were not protected by smp_load_acquire(). > > > > > > This patch replaces loads by smp_load_acquire() in queue_count(). > > > The observed errors no longer occur. > > > > > > Reported-by: Zhu Yanjun <zyj2020@xxxxxxxxx> > > > Fixes: d21a1240f516 ("RDMA/rxe: Use acquire/release for memory ordering") > > > Signed-off-by: Bob Pearson <rpearsonhpe@xxxxxxxxx> > > > drivers/infiniband/sw/rxe/rxe_queue.h | 18 ++++++++++++++++-- > > > 1 file changed, 16 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/infiniband/sw/rxe/rxe_queue.h b/drivers/infiniband/sw/rxe/rxe_queue.h > > > index 2902ca7b288c..5cb142282fa6 100644 > > > +++ b/drivers/infiniband/sw/rxe/rxe_queue.h > > > @@ -161,8 +161,22 @@ static inline unsigned int index_from_addr(const struct rxe_queue *q, > > > static inline unsigned int queue_count(const struct rxe_queue *q) > > > { > > > - return (q->buf->producer_index - q->buf->consumer_index) > > > - & q->index_mask; > > > + u32 prod; > > > + u32 cons; > > > + u32 count; > > > + > > > + /* make sure all changes to queue complete before > > > + * changing producer index > > > + */ > > > + prod = smp_load_acquire(&q->buf->producer_index); > > > + > > > + /* make sure all changes to queue complete before > > > + * changing consumer index > > > + */ > > > + cons = smp_load_acquire(&q->buf->consumer_index); > > > + count = (prod - cons) % q->index_mask; > > > + > > > + return count; > > > } > > It doesn't quite make sense to load both? > > > > Only the one written by userspace should require a load_acquire, the > > one written by the kernel should already be locked somehow with a > > kernel lock or there is a different problem > > > > But yes, this does look like a bug to me > > > > Jason > > The rxe_queue.h API is used by completion and work queues so both directions > occur. The question is the trade off between the logic to avoid the > load_acquire vs the branching to get around. Same point could be applied to > queue_empty() which already had both load_acquire() calls. Is it worth > creating several versions of APIs with different choices made? Hard to say.. I would probably have made special types for 'user producer' and 'user consumer' queues and had actually different behavior - the kernel controlled data should be shadowed in the kernel and only copied to the user - have to consider the user corrupting the data, for instance. Jason