On 5/25/2021 6:03 PM, Jason Gunthorpe wrote:
On Tue, May 25, 2021 at 03:11:12PM -0500, Bob Pearson wrote:
The rxe driver has recently begun exhibiting failures in the python
tests that are due to stale values read from the completion queue
producer or consumer indices. Unlike the other loads of these shared
indices those in queue_count() were not protected by smp_load_acquire().
This patch replaces loads by smp_load_acquire() in queue_count().
The observed errors no longer occur.
Reported-by: Zhu Yanjun <zyj2020@xxxxxxxxx>
Fixes: d21a1240f516 ("RDMA/rxe: Use acquire/release for memory ordering")
Signed-off-by: Bob Pearson <rpearsonhpe@xxxxxxxxx>
drivers/infiniband/sw/rxe/rxe_queue.h | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/sw/rxe/rxe_queue.h b/drivers/infiniband/sw/rxe/rxe_queue.h
index 2902ca7b288c..5cb142282fa6 100644
+++ b/drivers/infiniband/sw/rxe/rxe_queue.h
@@ -161,8 +161,22 @@ static inline unsigned int index_from_addr(const struct rxe_queue *q,
static inline unsigned int queue_count(const struct rxe_queue *q)
{
- return (q->buf->producer_index - q->buf->consumer_index)
- & q->index_mask;
+ u32 prod;
+ u32 cons;
+ u32 count;
+
+ /* make sure all changes to queue complete before
+ * changing producer index
+ */
+ prod = smp_load_acquire(&q->buf->producer_index);
+
+ /* make sure all changes to queue complete before
+ * changing consumer index
+ */
+ cons = smp_load_acquire(&q->buf->consumer_index);
+ count = (prod - cons) % q->index_mask;
+
+ return count;
}
It doesn't quite make sense to load both?
Only the one written by userspace should require a load_acquire, the
one written by the kernel should already be locked somehow with a
kernel lock or there is a different problem
But yes, this does look like a bug to me
Jason
The rxe_queue.h API is used by completion and work queues so both
directions occur. The question is the trade off between the logic to
avoid the load_acquire vs the branching to get around. Same point could
be applied to queue_empty() which already had both load_acquire() calls.
Is it worth creating several versions of APIs with different choices made?
Bob