Re: [PATCH 1/1] Revert "RDMA/rxe: Add workqueue support for rxe tasks"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/4/23 12:44, Bart Van Assche wrote:
> On 9/30/23 23:30, Leon Romanovsky wrote:
>> On Wed, Sep 27, 2023 at 11:51:12AM -0500, Bob Pearson wrote:
>>> On 9/26/23 15:24, Bart Van Assche wrote:
>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
>>>> index 1501120d4f52..6cd5d5a7a316 100644
>>>> --- a/drivers/infiniband/sw/rxe/rxe_task.c
>>>> +++ b/drivers/infiniband/sw/rxe/rxe_task.c
>>>> @@ -10,7 +10,7 @@ static struct workqueue_struct *rxe_wq;
>>>>
>>>>   int rxe_alloc_wq(void)
>>>>   {
>>>> -       rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, WQ_MAX_ACTIVE);
>>>> +       rxe_wq = alloc_workqueue("rxe_wq", WQ_UNBOUND, 1);
>>>>          if (!rxe_wq)
>>>>                  return -ENOMEM;
>>>>
>>>> Thanks,
>>>>
>>>> Bart.
>>
>> <...>
>>
>>> Nevertheless this is a good hint since it seems to imply that there is a race between the requester and
>>> completer which is certainly possible.
>>
>> Bob, Bart
>>
>> Can you please send this change as a formal patch?
>> As we prefer workqueue with bad performance implementation over tasklets.
> 
> Hi Bob,
> 
> Do you perhaps have a preference for who posts the formal patch?
> 
> Thanks,
> 
> Bart.
> 

Bart,

Not really.

I have spent the past two weeks chasing this bug and don't have much to report. I have never been able to
reproduce your kasan bug. I have found like Zhu that the hang is always there but the frequency changes a
lot depending on changes. For example various printk's can increase or decrease the frequency.

I spent this morning looking at flame graphs captured during the hang which lasts about 60 seconds before
it times out and check tears down the test. It is attached to this note. There seems to be a lot of recursion
in what I assume is some attempt at error recovery. The recursion is probably in user space because the
symbols are not available to perf.

I would be worried that there may be stack overflow which could cause bad behavior.

Bob

Attachment: perf-kernel.svg
Description: image/svg


[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux