在 2023/9/27 4:24, Bart Van Assche 写道:
On 9/26/23 11:34, Bob Pearson wrote:
I am working to try to reproduce the KASAN warning. Unfortunately,
so far I am not able to see it in Ubuntu + Linus' kernel (as you
described) on metal. The config file is different but copies the
CONFIG_KASAN_xxx exactly as yours. With KASAN enabled it hangs on
every iteration of srp/002 but without a KASAN warning. I am now
building an openSuSE VM for qemu and will see if that causes the warning.
Hi Bob,
Did you try to understand the report that I shared? My conclusion from
the report is that when using tasklets rxe_completer() only runs after
rxe_requester() has finished and also that when using work queues that
rxe_completer() may run concurrently with rxe_requester(). This patch
seems to fix all issues that I ran into with the rdma_rxe workqueue
patch (I have not tried to verify the performance implications of this
patch):
diff --git a/drivers/infiniband/sw/rxe/rxe_task.c
b/drivers/infiniband/sw/rxe/rxe_task.c
index 1501120d4f52..6cd5d5a7a316 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.c
+++ b/drivers/infiniband/sw/rxe/rxe_task.c
@@ -10,7 +10,7 @@ static struct workqueue_struct *rxe_wq;
int rxe_alloc_wq(void)
{
- rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, WQ_MAX_ACTIVE);
+ rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, 1);
if (!rxe_wq)
return -ENOMEM;
Hi, Bart
With the above commit, I still found a similar problem. But the problem
occurs very rarely. With the following, to now, the problem does not occur.
diff --git a/drivers/infiniband/sw/rxe/rxe_task.c
b/drivers/infiniband/sw/rxe/rxe_task.c
index 1501120d4f52..3189c3705295 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.c
+++ b/drivers/infiniband/sw/rxe/rxe_task.c
@@ -10,7 +10,7 @@ static struct workqueue_struct *rxe_wq;
int rxe_alloc_wq(void)
{
- rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, WQ_MAX_ACTIVE);
+ rxe_wq = alloc_workqueue("rxe_wq", WQ_HIGHPRI | WQ_UNBOUND, 1);
if (!rxe_wq)
return -ENOMEM;
And with the tasklet, this problem also does not occur.
With "alloc_workqueue("rxe_wq", WQ_HIGHPRI | WQ_UNBOUND, 1);", an
ordered workqueue with high priority is allocated.
To the same number of work item, the ordered workqueue has the same
runing time with the tasklet. But the tasklet is based on softirq. Its
overhead on scheduling is less than workqueue. So in theory, tasklet's
performance should be better than the ordered workqueue.
Best Regards,
Zhu Yanjun
Thanks,
Bart.