Hi Jason, I want to make sure that is there any further comments on this patch set? Thanks! Eason On 2019/11/21 19:19, Yixian Liu wrote: > Earlier Background: > HiP08 RoCE hardware lacks ability(a known hardware problem) to flush > outstanding WQEs if QP state gets into errored mode for some reason. > To overcome this hardware problem and as a workaround, when QP is > detected to be in errored state during various legs like post send, > post receive etc [1], flush needs to be performed from the driver. > > These data-path legs might get called concurrently from various context, > like thread and interrupt as well (like NVMe driver). Hence, these need > to be protected with spin-locks for the concurrency. This code exists > within the driver. > > Problem: > Earlier The patch[1] sent to solve the hardware limitation explained > in the background section had a bug in the software flushing leg. It > acquired mutex while modifying QP state to errored state and while > conveying it to the hardware using the mailbox. This caused leg to > sleep while holding spin-lock and caused crash. > > Suggested Solution: > In this patch, we have proposed to defer the flushing of the QP in > Errored state using the workqueue. > > We do understand that this might have an impact on the recovery times > as scheduling of the wqorkqueue handler depends upon the occupancy of > the system. Therefore to roughly mitigate this affect we have tried > to use Concurrency Managed workqueue to give worker thread (and > hence handler) a chance to run over more than one core. > > > [1] https://patchwork.kernel.org/patch/10534271/ > > > This patch-set consists of: > [Patch 001] Introduce workqueue based WQE Flush Handler > [Patch 002] Call WQE flush handler in post {send|receive|poll} > > v3 changes: > 1. Fall back to dynamically allocate flush_work. > > v2 changes: > 1. Remove new created workqueue according to Jason's comment > 2. Remove dynamic allocation for flush_work according to Jason's comment > 3. Change current irq singlethread workqueue to concurrency management > workqueue to ensure work unblocked. > > Yixian Liu (2): > RDMA/hns: Add the workqueue framework for flush cqe handler > RDMA/hns: Delayed flush cqe process with workqueue > > drivers/infiniband/hw/hns/hns_roce_device.h | 2 + > drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 88 +++++++++++++---------------- > drivers/infiniband/hw/hns/hns_roce_qp.c | 43 ++++++++++++++ > 3 files changed, 85 insertions(+), 48 deletions(-) >