On Wed, May 26, 2021 at 6:27 AM Bob Pearson <rpearsonhpe@xxxxxxxxx> wrote: > > The rxe driver has recently begun exhibiting failures in the python > tests that are due to stale values read from the completion queue > producer or consumer indices. Unlike the other loads of these shared > indices those in queue_count() were not protected by smp_load_acquire(). > > This patch replaces loads by smp_load_acquire() in queue_count(). > The observed errors no longer occur. > > Reported-by: Zhu Yanjun <zyj2020@xxxxxxxxx> My email address is wrong. It should be zyjzyj2000@xxxxxxxxx Zhu Yanjun > Fixes: d21a1240f516 ("RDMA/rxe: Use acquire/release for memory ordering") > Signed-off-by: Bob Pearson <rpearsonhpe@xxxxxxxxx> > --- > drivers/infiniband/sw/rxe/rxe_queue.h | 18 ++++++++++++++++-- > 1 file changed, 16 insertions(+), 2 deletions(-) > > diff --git a/drivers/infiniband/sw/rxe/rxe_queue.h b/drivers/infiniband/sw/rxe/rxe_queue.h > index 2902ca7b288c..5cb142282fa6 100644 > --- a/drivers/infiniband/sw/rxe/rxe_queue.h > +++ b/drivers/infiniband/sw/rxe/rxe_queue.h > @@ -161,8 +161,22 @@ static inline unsigned int index_from_addr(const struct rxe_queue *q, > > static inline unsigned int queue_count(const struct rxe_queue *q) > { > - return (q->buf->producer_index - q->buf->consumer_index) > - & q->index_mask; > + u32 prod; > + u32 cons; > + u32 count; > + > + /* make sure all changes to queue complete before > + * changing producer index > + */ > + prod = smp_load_acquire(&q->buf->producer_index); > + > + /* make sure all changes to queue complete before > + * changing consumer index > + */ > + cons = smp_load_acquire(&q->buf->consumer_index); > + count = (prod - cons) % q->index_mask; > + > + return count; > } > > static inline void *queue_head(struct rxe_queue *q) > -- > 2.30.2 >