Hi, This series introduces the support of flushing all WRs posted to hardware after QP state changed to ERROR. Old Firmware may not flush the newly posted WRs after QP state chagned to ERROR, because it's a little difficult for firmware to get the realtime PI (producer index) of QPs, especially for the RQs. Previously we want to avoid this issue by implementing custom drain_{sq/rq} [1], but this has falw, as Tom and Jason pointed out, which we also meet in some scenarios, for example, NoF fatal recovery. So, we introduce a new mechanism to fix this. When registering the ibdev, we create a workqueue for reflushing (we name it "reflush", because hardware is already start flushing for the QPs at that time, and it's used for hardware to flush newly posted WRs). Once QP needs to flush WRs, or new WRs posted after flushing, we post a delay work to the workqueue or modify the delay time if is already posted. In the work, driver notifies the lastest PIs to firmware by CMDQ, so that firmware can flush all the newly posted WRs. This applies to kernel QP first. - #1 adds a workqueue for WRs reflushing. - #2 adds a reflushing work for each QP. - #4 notifies the lastest PIs to firmware for reflushing. [1] https://lore.kernel.org/all/20220824094251.23190-3-chengyou@xxxxxxxxxxxxxxxxx/t/ Thanks, Cheng Xu Cheng Xu (3): RDMA/erdma: Add a workqueue for WRs reflushing RDMA/erdma: Implement the lifecycle of reflushing work for each QP RDMA/erdma: Notify the latest PI to FW for reflushing when necessary drivers/infiniband/hw/erdma/erdma.h | 1 + drivers/infiniband/hw/erdma/erdma_hw.h | 8 ++++++ drivers/infiniband/hw/erdma/erdma_main.c | 14 +++++++++-- drivers/infiniband/hw/erdma/erdma_qp.c | 30 ++++++++++++++++------- drivers/infiniband/hw/erdma/erdma_verbs.c | 18 ++++++++++++++ drivers/infiniband/hw/erdma/erdma_verbs.h | 7 ++++++ 6 files changed, 67 insertions(+), 11 deletions(-) -- 2.27.0