On Fri, Sep 20, 2019 at 11:55:56AM +0800, Liuyixian (Eason) wrote: > > > On 2019/9/11 21:17, Liuyixian (Eason) wrote: > > > > > > On 2019/9/10 15:52, Leon Romanovsky wrote: > >> On Tue, Sep 10, 2019 at 02:40:20PM +0800, Liuyixian (Eason) wrote: > >>> > >>> > >>> On 2019/9/8 16:03, Leon Romanovsky wrote: > >>>> On Thu, Sep 05, 2019 at 08:31:11PM +0800, Weihang Li wrote: > >>>>> From: Yixian Liu <liuyixian@xxxxxxxxxx> > >>>>> > >>>>> Hip08 has the feature flush cqe, which help to flush wqe in workqueue > >>>>> (sq and rq) when error happened by transmitting producer index with > >>>>> mailbox to hardware. Flush cqe is emplemented in post send and recv > >>>>> verbs. However, under NVMe cases, these verbs will be called under > >>>>> softirq context, and it will lead to following calltrace with > >>>>> current driver as mailbox used by flush cqe can go to sleep. > >>>>> > >>>>> This patch solves this problem by using workqueue to do flush cqe, > >>>> > >>>> Unbelievable, almost every bug in this driver is solved by introducing > >>>> workqueue. You should fix "sleep in flush path" issue and not by adding > >>>> new workqueue. > >>>> > >>> Hi Leon, > >>> > >>> Thanks for the comment. > >>> Up to now, for hip08, only one place use workqueue in hns_roce_hw_v2.c > >>> where for irq prints. > >> > >> Thanks to our lack of desire to add more workqueues and previous patches > >> which removed extra workqueues from the driver. > >> > > Thanks, I see. > > > >>> > >>> The solution for flush cqe in this patch is as follow: > >>> While flush cqe should be implement, the driver should modify qp to error state > >>> through mailbox with the newest product index of sq and rq, the hardware then > >>> can flush all outstanding wqes in sq and rq. > >>> > >>> That's the whole mechanism of flush cqe, also is the flush path. We can't > >>> change neither mailbox sleep attribute or flush cqe occurred in post send/recv. > >>> To avoid the calltrace of flush cqe in post verbs under NVMe softirq, > >>> use workqueue for flush cqe seems reasonable. > >>> > >>> As far as I know, there is no other alternative solution for this situation. > >>> I will be very grateful if you reminder me more information. > >> > >> ib_drain_rq/ib_drain_sq/ib_drain_qp???? > >> > > Hi Leon, > > > > I think these interfaces are designed for application to check that all wqes > > have been processed by hardware, so called drain or flush. However, it is not > > the same as the flush in this patch. The solution in this patch is used > > to help the hardware generate flush cqes for outstanding wqes while qp error. > > > Hi Leon, > > What's your opinion about above? Do you have any further comments? My opinion didn't change, you need to read discussions about ib_drain_*() functions, how and why they were introduced. It is a way to go. Thanks > > Thanks. > > >>> > >>> Thanks > >>> > >>>> _______________________________________________ > >>>> Linuxarm mailing list > >>>> Linuxarm@xxxxxxxxxx > >>>> http://hulk.huawei.com/mailman/listinfo/linuxarm > >>>> > >>>> > >>> > >> > >> . > >> > > > > _______________________________________________ > > Linuxarm mailing list > > Linuxarm@xxxxxxxxxx > > http://hulk.huawei.com/mailman/listinfo/linuxarm > > > > . > > >