On Mon, Sep 13, 2021 at 03:14:42PM +0300, Shai Malin wrote: > If the HW device is during recovery, the HW resources will never return, > hence we shouldn't wait for the CID (HW context ID) bitmaps to clear. > This fix speeds up the error recovery flow. > > Fixes: 64515dc899df ("qed: Add infrastructure for error detection and recovery") > Signed-off-by: Michal Kalderon <mkalderon@xxxxxxxxxxx> > Signed-off-by: Ariel Elior <aelior@xxxxxxxxxxx> > Signed-off-by: Shai Malin <smalin@xxxxxxxxxxx> > --- > drivers/net/ethernet/qlogic/qed/qed_iwarp.c | 7 +++++++ > drivers/net/ethernet/qlogic/qed/qed_roce.c | 7 +++++++ > 2 files changed, 14 insertions(+) > > diff --git a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c > index fc8b3e64f153..4967e383c31a 100644 > --- a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c > +++ b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c > @@ -1323,6 +1323,13 @@ static int qed_iwarp_wait_for_all_cids(struct qed_hwfn *p_hwfn) > int rc; > int i; > > + /* If the HW device is during recovery, all resources are immediately > + * reset without receiving a per-cid indication from HW. In this case > + * we don't expect the cid_map to be cleared. > + */ > + if (p_hwfn->cdev->recov_in_prog) > + return 0; How do you ensure that this doesn't race with recovery flow? > + > rc = qed_iwarp_wait_cid_map_cleared(p_hwfn, > &p_hwfn->p_rdma_info->tcp_cid_map); > if (rc) > diff --git a/drivers/net/ethernet/qlogic/qed/qed_roce.c b/drivers/net/ethernet/qlogic/qed/qed_roce.c > index f16a157bb95a..aff5a2871b8f 100644 > --- a/drivers/net/ethernet/qlogic/qed/qed_roce.c > +++ b/drivers/net/ethernet/qlogic/qed/qed_roce.c > @@ -71,6 +71,13 @@ void qed_roce_stop(struct qed_hwfn *p_hwfn) > struct qed_bmap *rcid_map = &p_hwfn->p_rdma_info->real_cid_map; > int wait_count = 0; > > + /* If the HW device is during recovery, all resources are immediately > + * reset without receiving a per-cid indication from HW. In this case > + * we don't expect the cid bitmap to be cleared. > + */ > + if (p_hwfn->cdev->recov_in_prog) > + return; > + > /* when destroying a_RoCE QP the control is returned to the user after > * the synchronous part. The asynchronous part may take a little longer. > * We delay for a short while if an async destroy QP is still expected. > -- > 2.22.0 >