Re: [PATCH for-next] RDMA/hns: Bugfix for flush cqe in case softirq and multi-process

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 20, 2019 at 11:55:56AM +0800, Liuyixian (Eason) wrote:
>
>
> On 2019/9/11 21:17, Liuyixian (Eason) wrote:
> >
> >
> > On 2019/9/10 15:52, Leon Romanovsky wrote:
> >> On Tue, Sep 10, 2019 at 02:40:20PM +0800, Liuyixian (Eason) wrote:
> >>>
> >>>
> >>> On 2019/9/8 16:03, Leon Romanovsky wrote:
> >>>> On Thu, Sep 05, 2019 at 08:31:11PM +0800, Weihang Li wrote:
> >>>>> From: Yixian Liu <liuyixian@xxxxxxxxxx>
> >>>>>
> >>>>> Hip08 has the feature flush cqe, which help to flush wqe in workqueue
> >>>>> (sq and rq) when error happened by transmitting producer index with
> >>>>> mailbox to hardware. Flush cqe is emplemented in post send and recv
> >>>>> verbs. However, under NVMe cases, these verbs will be called under
> >>>>> softirq context, and it will lead to following calltrace with
> >>>>> current driver as mailbox used by flush cqe can go to sleep.
> >>>>>
> >>>>> This patch solves this problem by using workqueue to do flush cqe,
> >>>>
> >>>> Unbelievable, almost every bug in this driver is solved by introducing
> >>>> workqueue. You should fix "sleep in flush path" issue and not by adding
> >>>> new workqueue.
> >>>>
> >>> Hi Leon,
> >>>
> >>> Thanks for the comment.
> >>> Up to now, for hip08, only one place use workqueue in hns_roce_hw_v2.c
> >>> where for irq prints.
> >>
> >> Thanks to our lack of desire to add more workqueues and previous patches
> >> which removed extra workqueues from the driver.
> >>
> > Thanks, I see.
> >
> >>>
> >>> The solution for flush cqe in this patch is as follow:
> >>> While flush cqe should be implement, the driver should modify qp to error state
> >>> through mailbox with the newest product index of sq and rq, the hardware then
> >>> can flush all outstanding wqes in sq and rq.
> >>>
> >>> That's the whole mechanism of flush cqe, also is the flush path. We can't
> >>> change neither mailbox sleep attribute or flush cqe occurred in post send/recv.
> >>> To avoid the calltrace of flush cqe in post verbs under NVMe softirq,
> >>> use workqueue for flush cqe seems reasonable.
> >>>
> >>> As far as I know, there is no other alternative solution for this situation.
> >>> I will be very grateful if you reminder me more information.
> >>
> >> ib_drain_rq/ib_drain_sq/ib_drain_qp????
> >>
> > Hi Leon,
> >
> > I think these interfaces are designed for application to check that all wqes
> > have been processed by hardware, so called drain or flush. However, it is not
> > the same as the flush in this patch. The solution in this patch is used
> > to help the hardware generate flush cqes for outstanding wqes while qp error.
> >
> Hi Leon,
>
> What's your opinion about above? Do you have any further comments?

My opinion didn't change, you need to read discussions about ib_drain_*()
functions, how and why they were introduced. It is a way to go.

Thanks

>
> Thanks.
>
> >>>
> >>> Thanks
> >>>
> >>>> _______________________________________________
> >>>> Linuxarm mailing list
> >>>> Linuxarm@xxxxxxxxxx
> >>>> http://hulk.huawei.com/mailman/listinfo/linuxarm
> >>>>
> >>>>
> >>>
> >>
> >> .
> >>
> >
> > _______________________________________________
> > Linuxarm mailing list
> > Linuxarm@xxxxxxxxxx
> > http://hulk.huawei.com/mailman/listinfo/linuxarm
> >
> > .
> >
>



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux