On Wed, Nov 16, 2022 at 10:31:04AM +0800, Cheng Xu wrote: > Hi, > > This series introduces the support of flushing all WRs posted to hardware > after QP state changed to ERROR. > > Old Firmware may not flush the newly posted WRs after QP state chagned to > ERROR, because it's a little difficult for firmware to get the realtime > PI (producer index) of QPs, especially for the RQs. > > Previously we want to avoid this issue by implementing custom > drain_{sq/rq} [1], but this has falw, as Tom and Jason pointed out, which > we also meet in some scenarios, for example, NoF fatal recovery. > > So, we introduce a new mechanism to fix this. When registering the ibdev, > we create a workqueue for reflushing (we name it "reflush", because > hardware is already start flushing for the QPs at that time, and it's used > for hardware to flush newly posted WRs). Once QP needs to flush WRs, or > new WRs posted after flushing, we post a delay work to the workqueue or > modify the delay time if is already posted. In the work, driver notifies > the lastest PIs to firmware by CMDQ, so that firmware can flush all the > newly posted WRs. This applies to kernel QP first. > > - #1 adds a workqueue for WRs reflushing. > - #2 adds a reflushing work for each QP. > - #4 notifies the lastest PIs to firmware for reflushing. > > [1] https://lore.kernel.org/all/20220824094251.23190-3-chengyou@xxxxxxxxxxxxxxxxx/t/ > > Thanks, > Cheng Xu > > Cheng Xu (3): > RDMA/erdma: Add a workqueue for WRs reflushing > RDMA/erdma: Implement the lifecycle of reflushing work for each QP > RDMA/erdma: Notify the latest PI to FW for reflushing when necessary Applied to for-next, thanks Jason