Re: [PATCH 1/1] Revert "RDMA/rxe: Add workqueue support for rxe tasks"

Zhu Yanjun <yanjun.zhu@xxxxxxxxx> · Wed, 4 Oct 2023 09:00:17 +0800

在 2023/10/4 8:46, Zhu Yanjun 写道:

在 2023/10/4 2:11, Leon Romanovsky 写道:
On Tue, Oct 03, 2023 at 11:29:42PM +0800, Zhu Yanjun wrote:
在 2023/10/3 17:59, Leon Romanovsky 写道:
On Tue, Oct 03, 2023 at 04:55:40PM +0800, Zhu Yanjun wrote:
在 2023/10/1 14:50, Leon Romanovsky 写道:
On Sun, Oct 1, 2023, at 09:47, Zhu Yanjun wrote:
在 2023/10/1 14:39, Leon Romanovsky 写道:
On Sun, Oct 1, 2023, at 09:34, Zhu Yanjun wrote:
在 2023/10/1 14:30, Leon Romanovsky 写道:
On Wed, Sep 27, 2023 at 11:51:12AM -0500, Bob Pearson wrote:
On 9/26/23 15:24, Bart Van Assche wrote:
On 9/26/23 11:34, Bob Pearson wrote:
I am working to try to reproduce the KASAN warning. 
Unfortunately,
so far I am not able to see it in Ubuntu + Linus' kernel 
(as you described) on metal. The config file is different 
but copies the CONFIG_KASAN_xxx exactly as yours. With 
KASAN enabled it hangs on every iteration of srp/002 but 
without a KASAN warning. I am now building an openSuSE VM 
for qemu and will see if that causes the warning.
Hi Bob,

Did you try to understand the report that I shared? My 
conclusion from
the report is that when using tasklets rxe_completer() only 
runs after
rxe_requester() has finished and also that when using work 
queues that
rxe_completer() may run concurrently with rxe_requester(). 
This patch
seems to fix all issues that I ran into with the rdma_rxe 
workqueue
patch (I have not tried to verify the performance 
implications of this
patch):

diff --git a/drivers/infiniband/sw/rxe/rxe_task.c 
b/drivers/infiniband/sw/rxe/rxe_task.c
index 1501120d4f52..6cd5d5a7a316 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.c
+++ b/drivers/infiniband/sw/rxe/rxe_task.c
@@ -10,7 +10,7 @@ static struct workqueue_struct *rxe_wq;

      int rxe_alloc_wq(void)
      {
-       rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, 
WQ_MAX_ACTIVE);
+       rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, 1);
             if (!rxe_wq)
                     return -ENOMEM;
With this commit, a test run for several days. The similar problem 
still
occurred.

The problem is very similar with the one that Bart mentioned.

It is very possible that WQ_MAX_ACTIVE is changed to 1, then this 
problem is
alleviated.

In the following

4661 __printf(1, 4)
4662 struct workqueue_struct *alloc_workqueue(const char *fmt,
4663                                          unsigned int flags,
4664                                          int max_active, ...)
4665 {
4666         va_list args;
4667         struct workqueue_struct *wq;
4668         struct pool_workqueue *pwq;
4669
4670         /*
4671          * Unbound && max_active == 1 used to imply ordered, 
which is
no longer
4672          * the case on many machines due to per-pod pools. While
4673          * alloc_ordered_workqueue() is the right way to 
create an
ordered
4674          * workqueue, keep the previous behavior to avoid subtle
breakages.
4675          */
4676         if ((flags & WQ_UNBOUND) && max_active == 1)
<---This means that workqueue is ordered.
4677                 flags |= __WQ_ORDERED;
...

Do this mean that the ordered workqueue covers the root cause? When
workqueue is changed to ordered, it is difficult to reproduce this 
problem.

Got it.

Is there any way to ensure the following?

if a mail does not appear in the rdma maillist, this mail will not be 
reviewed?


Sorry. My bad. I used the wrong rdma maillist.




The analysis is as below:

Because workqueue will sleep when it is preempted, sometimes the 
sleep time
will exceed the timeout

of rdma packets. As such, rdma stack or ULP will oom or hang. This 
is why
workqueue will cause ULP hang.

But tasklet will not sleep. So this kind of problem will not occur with
tasklet.

About the performance, currently ordered workqueue can only execute 
at most
one work item at any given

time in the queued order. So in RXE, workqueue will not execute more 
jobs
than tasklet.
It is because of changing max_active to be 1. Once that bug will be
fixed, RXE will be able to spread traffic on all CPUs.


Sure. I agree with you.


After max_active is changed to 1, the workqueue is the ordered workqueue.

The ordered workqueue will execute the work item one by one on differen 
CPUs,

that is, after one work item is complete, the ordered workqueue will 
execute another one

in the queued order on different CPUs. Tasklet will execute the jobs in 
the same CPU one by one.

So if the total job number is the same, the ordered workqueue will have 
the same execution time with the tasklet.

But the ordered workqueue has more overhead in scheduling than the tasklet.

In total, the performance of the ordered workqueue is not good compared 
with the tasklet.

Zhu Yanjun