On 9/26/23 15:24, Bart Van Assche wrote: > On 9/26/23 11:34, Bob Pearson wrote: >> I am working to try to reproduce the KASAN warning. Unfortunately, >> so far I am not able to see it in Ubuntu + Linus' kernel (as you described) on metal. The config file is different but copies the CONFIG_KASAN_xxx exactly as yours. With KASAN enabled it hangs on every iteration of srp/002 but without a KASAN warning. I am now building an openSuSE VM for qemu and will see if that causes the warning. > > Hi Bob, > > Did you try to understand the report that I shared? My conclusion from > the report is that when using tasklets rxe_completer() only runs after > rxe_requester() has finished and also that when using work queues that > rxe_completer() may run concurrently with rxe_requester(). This patch > seems to fix all issues that I ran into with the rdma_rxe workqueue > patch (I have not tried to verify the performance implications of this > patch): > > diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c > index 1501120d4f52..6cd5d5a7a316 100644 > --- a/drivers/infiniband/sw/rxe/rxe_task.c > +++ b/drivers/infiniband/sw/rxe/rxe_task.c > @@ -10,7 +10,7 @@ static struct workqueue_struct *rxe_wq; > > int rxe_alloc_wq(void) > { > - rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, WQ_MAX_ACTIVE); > + rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, 1); > if (!rxe_wq) > return -ENOMEM; > > Thanks, > > Bart. The workqueue doc says Some users depend on the strict execution ordering of ST wq. The combination of @max_active of 1 and WQ_UNBOUND is used to achieve this behavior. Work items on such wq are always queued to the unbound worker-pools and only one work item can be active at any given time thus achieving the same ordering property as ST wq. When I have tried this setting I see very low performance compared to 512. It seems that only one item at a time can run on all the CPUs even though it also says that max_active is the number of threads per cpu. Nevertheless this is a good hint since it seems to imply that there is a race between the requester and completer which is certainly possible. Bob