On 2020/2/6 4:30, Jason Gunthorpe wrote: > On Tue, Feb 04, 2020 at 04:47:38PM +0800, Liuyixian (Eason) wrote: >> >> >> On 2020/1/29 4:05, Jason Gunthorpe wrote: >>> On Wed, Jan 15, 2020 at 05:49:13PM +0800, Yixian Liu wrote: >>>> diff --git a/drivers/infiniband/hw/hns/hns_roce_qp.c b/drivers/infiniband/hw/hns/hns_roce_qp.c >>>> index fa38582..ad7ed07 100644 >>>> +++ b/drivers/infiniband/hw/hns/hns_roce_qp.c >>>> @@ -56,10 +56,16 @@ static void flush_work_handle(struct work_struct *work) >>>> attr_mask = IB_QP_STATE; >>>> attr.qp_state = IB_QPS_ERR; >>>> >>>> - ret = hns_roce_modify_qp(&hr_qp->ibqp, &attr, attr_mask, NULL); >>>> - if (ret) >>>> - dev_err(dev, "Modify QP to error state failed(%d) during CQE flush\n", >>>> - ret); >>>> + while (atomic_read(&hr_qp->flush_cnt)) { >>>> + ret = hns_roce_modify_qp(&hr_qp->ibqp, &attr, attr_mask, NULL); >>>> + if (ret) >>>> + dev_err(dev, "Modify QP to error state failed(%d) during CQE flush\n", >>>> + ret); >>>> + >>>> + /* If flush_cnt larger than 1, only need one more time flush */ >>>> + if (atomic_dec_and_test(&hr_qp->flush_cnt)) >>>> + atomic_set(&hr_qp->flush_cnt, 1); >>>> + } >>> >>> And this while loop is just >> >> There is a bug here, the code should be: >> if (!atomic_dec_and_test(&hr_qp->flush_cnt)) >> atomic_set(&hr_qp->flush_cnt, 1); >> >> It merges all further flush operation requirements into only one more time flush, >> that is, do the loop once again if flush_cnt larger than 1. >> >>> >>> if (atomic_xchg(&hr_qp->flush_cnt, 0)) { >>> [..] >>> } >> >> I think we can't use if instead of while loop. > > Well, you can't do two operations and still have an atomic, so you > have to fix it somehow. Possibly this needs a spinlock approach > instead. Agree. > >> With your solution, when user posts a new wr during the >> implementation of [...] in if condition, it will re-queue a new >> init_flush_work, which will lead to a multiple call problem as we >> discussed in v2. > > queue_work can be called while a work is still running, it just makes > sure it will run again. Agree. > >>> I'm not even sure this needs to be a counter, all you need is set_bit() >>> and test_and_clear() >> >> We need the value of flush_cnt large than 1 to record further flush >> requirements, that's why flush_cnt can be defined as a flag or bit >> value. > > This explanation doesn't make sense, the counter isn't being used to > count anything, it is just a flag. Yes, you are right. I have reconsidered the solution with your suggestion, flag is enough for whole solution. Will fix it in v8 with flag idea. Thanks a lot. > > Jason > >