On 9/26/23 15:24, Bart Van Assche wrote: > On 9/26/23 11:34, Bob Pearson wrote: >> I am working to try to reproduce the KASAN warning. Unfortunately, >> so far I am not able to see it in Ubuntu + Linus' kernel (as you described) on metal. The config file is different but copies the CONFIG_KASAN_xxx exactly as yours. With KASAN enabled it hangs on every iteration of srp/002 but without a KASAN warning. I am now building an openSuSE VM for qemu and will see if that causes the warning. > > Hi Bob, > > Did you try to understand the report that I shared? Looking at the three stack traces from KASAN (alloc, free, and use after free) it appears that there was an ack packet (skb) created in rxe_responder (normal) and then passed to rxe_completer which apparently successfully processed it and then freed the skb (also normal). Then the same skb is enqueued on the response queue in rxe_comp_queue_pkt(). This is very strange and hard to understand. The only way the original packet could have been 'completed' would be for it to have been first enqueued on qp->resp_pkts by skb_queue_tail() and then dequeued after the completer task runs by skb_dequeue(). The skb queue routines are protected by an irqsave spinlock so they should operate atomically. In other words the completer can't get the skb until skb_queue_tail() is finished touching the skb. So it looks like the first pass through rxe_comp_queue_pkt() shouldn't be to blame. There is no way I can imagine that the packet could be queued twice on the local loopback path. One strange thing in the trace is a "? rxe_recv_mcast_pkt" which seems unlikely to be true as all the packets are rc and hence not mcast. Not sure how to interpret this. Perhaps the stack is corrupted from scribbles which might cause the above impossibility. My conclusion from > the report is that when using tasklets rxe_completer() only runs after > rxe_requester() has finished and also that when using work queues that > rxe_completer() may run concurrently with rxe_requester(). The completer task was always intended to run in parallel with the requester and responder tasks whether they are tasklets or workqueue items. Tasklets tend to run sequentially but there is no reason whey they can't run in parallel. The completer task is triggered by response packets from another process's queue pair which is asynchronous from the requester task which generated the request packets. For unrelated reasons I am planning to merge the requester task and completer task into a single task because in high scale situation with lots of qps it performs better and allows removing some of the locking between them. This patch > seems to fix all issues that I ran into with the rdma_rxe workqueue > patch (I have not tried to verify the performance implications of this > patch): > > diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c > index 1501120d4f52..6cd5d5a7a316 100644 > --- a/drivers/infiniband/sw/rxe/rxe_task.c > +++ b/drivers/infiniband/sw/rxe/rxe_task.c > @@ -10,7 +10,7 @@ static struct workqueue_struct *rxe_wq; > > int rxe_alloc_wq(void) > { > - rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, WQ_MAX_ACTIVE); > + rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, 1); > if (!rxe_wq) > return -ENOMEM; > > Thanks, > > Bart.