Re: [PATCH 1/1] Revert "RDMA/rxe: Add workqueue support for rxe tasks"

Bob Pearson <rpearsonhpe@xxxxxxxxx> · Wed, 27 Sep 2023 11:36:04 -0500

On 9/26/23 15:24, Bart Van Assche wrote:
> On 9/26/23 11:34, Bob Pearson wrote:
>> I am working to try to reproduce the KASAN warning. Unfortunately,
>> so far I am not able to see it in Ubuntu + Linus' kernel (as you described) on metal. The config file is different but copies the CONFIG_KASAN_xxx exactly as yours. With KASAN enabled it hangs on every iteration of srp/002 but without a KASAN warning. I am now building an openSuSE VM for qemu and will see if that causes the warning.
> 
> Hi Bob,
> 
> Did you try to understand the report that I shared? 

Looking at the three stack traces from KASAN (alloc, free, and use after free)
it appears that there was an ack packet (skb) created in rxe_responder (normal) and
then passed to rxe_completer which apparently successfully processed it and then
freed the skb (also normal). Then the same skb is enqueued on the response queue
in rxe_comp_queue_pkt(). This is very strange and hard to understand. The only way
the original packet could have been 'completed' would be for it to have been first
enqueued on qp->resp_pkts by skb_queue_tail() and then dequeued after the completer
task runs by skb_dequeue(). The skb queue routines are protected by an irqsave spinlock
so they should operate atomically. In other words the completer can't get the skb until
skb_queue_tail() is finished touching the skb. So it looks like the first pass through
rxe_comp_queue_pkt() shouldn't be to blame. There is no way I can imagine that the
packet could be queued twice on the local loopback path. One strange thing in the
trace is a "? rxe_recv_mcast_pkt" which seems unlikely to be true as all the packets
are rc and hence not mcast. Not sure how to interpret this. Perhaps the stack is
corrupted from scribbles which might cause the above impossibility.

My conclusion from
> the report is that when using tasklets rxe_completer() only runs after
> rxe_requester() has finished and also that when using work queues that
> rxe_completer() may run concurrently with rxe_requester(). 

The completer task was always intended to run in parallel with the requester
and responder tasks whether they are tasklets or workqueue items. Tasklets
tend to run sequentially but there is no reason whey they can't run in
parallel. The completer task is triggered by response packets from another
process's queue pair which is asynchronous from the requester task which
generated the request packets.

For unrelated reasons I am planning to merge the requester task and completer
task into a single task because in high scale situation with lots of qps
it performs better and allows removing some of the locking between them.

This patch
> seems to fix all issues that I ran into with the rdma_rxe workqueue
> patch (I have not tried to verify the performance implications of this
> patch):
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
> index 1501120d4f52..6cd5d5a7a316 100644
> --- a/drivers/infiniband/sw/rxe/rxe_task.c
> +++ b/drivers/infiniband/sw/rxe/rxe_task.c
> @@ -10,7 +10,7 @@ static struct workqueue_struct *rxe_wq;
> 
>  int rxe_alloc_wq(void)
>  {
> -       rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, WQ_MAX_ACTIVE);
> +       rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, 1);
>         if (!rxe_wq)
>                 return -ENOMEM;
> 
> Thanks,
> 
> Bart.