Re: [PATCH for-next] RDMA/hns: Bugfix for flush cqe in case softirq and multi-process

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2019/10/28 17:34, Liuyixian (Eason) wrote:
> 
> 
> On 2019/10/15 16:00, Leon Romanovsky wrote:
>> On Sat, Oct 12, 2019 at 11:53:36AM +0800, Liuyixian (Eason) wrote:
>>>
>>>
>>> On 2019/9/24 11:54, Liuyixian (Eason) wrote:
>>>>
>>>>
>>>> On 2019/9/23 13:01, Leon Romanovsky wrote:
>>>>> On Fri, Sep 20, 2019 at 11:55:56AM +0800, Liuyixian (Eason) wrote:
>>>>>>
>>>>>>
>>>>>> On 2019/9/11 21:17, Liuyixian (Eason) wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 2019/9/10 15:52, Leon Romanovsky wrote:
>>>>>>>> On Tue, Sep 10, 2019 at 02:40:20PM +0800, Liuyixian (Eason) wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2019/9/8 16:03, Leon Romanovsky wrote:
>>>>>>>>>> On Thu, Sep 05, 2019 at 08:31:11PM +0800, Weihang Li wrote:
>>>>>>>>>>> From: Yixian Liu <liuyixian@xxxxxxxxxx>
>>>>>>>>>>>
>>>>>>>>>>> Hip08 has the feature flush cqe, which help to flush wqe in workqueue
>>>>>>>>>>> (sq and rq) when error happened by transmitting producer index with
>>>>>>>>>>> mailbox to hardware. Flush cqe is emplemented in post send and recv
>>>>>>>>>>> verbs. However, under NVMe cases, these verbs will be called under
>>>>>>>>>>> softirq context, and it will lead to following calltrace with
>>>>>>>>>>> current driver as mailbox used by flush cqe can go to sleep.
>>>>>>>>>>>
>>>>>>>>>>> This patch solves this problem by using workqueue to do flush cqe,
>>>>>>>>>>
>>>>>>>>>> Unbelievable, almost every bug in this driver is solved by introducing
>>>>>>>>>> workqueue. You should fix "sleep in flush path" issue and not by adding
>>>>>>>>>> new workqueue.
>>>>>>>>>>
>>>>>>>>> Hi Leon,
>>>>>>>>>
>>>>>>>>> Thanks for the comment.
>>>>>>>>> Up to now, for hip08, only one place use workqueue in hns_roce_hw_v2.c
>>>>>>>>> where for irq prints.
>>>>>>>>
>>>>>>>> Thanks to our lack of desire to add more workqueues and previous patches
>>>>>>>> which removed extra workqueues from the driver.
>>>>>>>>
>>>>>>> Thanks, I see.
>>>>>>>
>>>>>>>>>
>>>>>>>>> The solution for flush cqe in this patch is as follow:
>>>>>>>>> While flush cqe should be implement, the driver should modify qp to error state
>>>>>>>>> through mailbox with the newest product index of sq and rq, the hardware then
>>>>>>>>> can flush all outstanding wqes in sq and rq.
>>>>>>>>>
>>>>>>>>> That's the whole mechanism of flush cqe, also is the flush path. We can't
>>>>>>>>> change neither mailbox sleep attribute or flush cqe occurred in post send/recv.
>>>>>>>>> To avoid the calltrace of flush cqe in post verbs under NVMe softirq,
>>>>>>>>> use workqueue for flush cqe seems reasonable.
>>>>>>>>>
>>>>>>>>> As far as I know, there is no other alternative solution for this situation.
>>>>>>>>> I will be very grateful if you reminder me more information.
>>>>>>>>
>>>>>>>> ib_drain_rq/ib_drain_sq/ib_drain_qp????
>>>>>>>>
>>>>>>> Hi Leon,
>>>>>>>
>>>>>>> I think these interfaces are designed for application to check that all wqes
>>>>>>> have been processed by hardware, so called drain or flush. However, it is not
>>>>>>> the same as the flush in this patch. The solution in this patch is used
>>>>>>> to help the hardware generate flush cqes for outstanding wqes while qp error.
>>>>>>>
>>>>>> Hi Leon,
>>>>>>
>>>>>> What's your opinion about above? Do you have any further comments?
>>>>>
>>>>> My opinion didn't change, you need to read discussions about ib_drain_*()
>>>>> functions, how and why they were introduced. It is a way to go.
>>>>>
>>>>> Thanks
>>>>
>>>> Hi Leon,
>>>>
>>>> Thanks a lot! I will dig those functions for my problem.
>>>>
>>>
>>> Hi Leon,
>>>
>>> I have analysis the mechanism of ib_drain_(qp, sq, rq), that's okay to use
>>> it instead of our flush cqe as both of them are calling modify qp to error
>>> state in flush path.
>>>
>>> However, both ib_drain_* and flush cqe will face the same problem as declared
>>> in previous emails, that is, in NVME case, post verbs will be called under
>>> **softirq**, which will result to calltrace as mailbox used in modify qp
>>> (flush path) can sleep, this is not allowed under softirq.
>>>
>>> Thus, to resolve above calltrace (sleep in softirq), using workqueue as in
>>> this patch seems is a reasonable solution regardless of ib_drain_qp or
>>> flush cqe is called in the workqueue.
>>>
>>> I think it is not a good idea to fix sleep in flush path (actually referred
>>> to mailbox used in modify qp) as the mailbox is such a mature mechanism.
>>
>> No, it is not reasonable solution.
>>
> 
> Hi Leon,
> 
>      I have explained this issue better in another patch set and pruned other logic.
>      Thanks a lot for your review!
> 
> Best regards.
> Eason
> 

Hi Doug and Loen,

I just want to make sure that you know the above mentioned patch set is on:
https://patchwork.kernel.org/project/linux-rdma/list/?series=194423

Sorry to reply your last comment so late as I analyzed all possible solutions with
your comment, and found that I haven't describe our problem clear enough and accurate,
thus, I made this new patch set with simple logic and detailed commit message. I hope
I have clearly explained this problem .

Thanks.







[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux