On Tue, Feb 04, 2020 at 04:47:38PM +0800, Liuyixian (Eason) wrote: > > > On 2020/1/29 4:05, Jason Gunthorpe wrote: > > On Wed, Jan 15, 2020 at 05:49:13PM +0800, Yixian Liu wrote: > >> diff --git a/drivers/infiniband/hw/hns/hns_roce_qp.c b/drivers/infiniband/hw/hns/hns_roce_qp.c > >> index fa38582..ad7ed07 100644 > >> +++ b/drivers/infiniband/hw/hns/hns_roce_qp.c > >> @@ -56,10 +56,16 @@ static void flush_work_handle(struct work_struct *work) > >> attr_mask = IB_QP_STATE; > >> attr.qp_state = IB_QPS_ERR; > >> > >> - ret = hns_roce_modify_qp(&hr_qp->ibqp, &attr, attr_mask, NULL); > >> - if (ret) > >> - dev_err(dev, "Modify QP to error state failed(%d) during CQE flush\n", > >> - ret); > >> + while (atomic_read(&hr_qp->flush_cnt)) { > >> + ret = hns_roce_modify_qp(&hr_qp->ibqp, &attr, attr_mask, NULL); > >> + if (ret) > >> + dev_err(dev, "Modify QP to error state failed(%d) during CQE flush\n", > >> + ret); > >> + > >> + /* If flush_cnt larger than 1, only need one more time flush */ > >> + if (atomic_dec_and_test(&hr_qp->flush_cnt)) > >> + atomic_set(&hr_qp->flush_cnt, 1); > >> + } > > > > And this while loop is just > > There is a bug here, the code should be: > if (!atomic_dec_and_test(&hr_qp->flush_cnt)) > atomic_set(&hr_qp->flush_cnt, 1); > > It merges all further flush operation requirements into only one more time flush, > that is, do the loop once again if flush_cnt larger than 1. > > > > > if (atomic_xchg(&hr_qp->flush_cnt, 0)) { > > [..] > > } > > I think we can't use if instead of while loop. Well, you can't do two operations and still have an atomic, so you have to fix it somehow. Possibly this needs a spinlock approach instead. > With your solution, when user posts a new wr during the > implementation of [...] in if condition, it will re-queue a new > init_flush_work, which will lead to a multiple call problem as we > discussed in v2. queue_work can be called while a work is still running, it just makes sure it will run again. > > I'm not even sure this needs to be a counter, all you need is set_bit() > > and test_and_clear() > > We need the value of flush_cnt large than 1 to record further flush > requirements, that's why flush_cnt can be defined as a flag or bit > value. This explanation doesn't make sense, the counter isn't being used to count anything, it is just a flag. Jason