Re: [PATCH v7 for-next 2/2] RDMA/hns: Delayed flush cqe process with workqueue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2020/1/29 4:05, Jason Gunthorpe wrote:
> On Wed, Jan 15, 2020 at 05:49:13PM +0800, Yixian Liu wrote:
>> diff --git a/drivers/infiniband/hw/hns/hns_roce_qp.c b/drivers/infiniband/hw/hns/hns_roce_qp.c
>> index fa38582..ad7ed07 100644
>> +++ b/drivers/infiniband/hw/hns/hns_roce_qp.c
>> @@ -56,10 +56,16 @@ static void flush_work_handle(struct work_struct *work)
>>  	attr_mask = IB_QP_STATE;
>>  	attr.qp_state = IB_QPS_ERR;
>>  
>> -	ret = hns_roce_modify_qp(&hr_qp->ibqp, &attr, attr_mask, NULL);
>> -	if (ret)
>> -		dev_err(dev, "Modify QP to error state failed(%d) during CQE flush\n",
>> -			ret);
>> +	while (atomic_read(&hr_qp->flush_cnt)) {
>> +		ret = hns_roce_modify_qp(&hr_qp->ibqp, &attr, attr_mask, NULL);
>> +		if (ret)
>> +			dev_err(dev, "Modify QP to error state failed(%d) during CQE flush\n",
>> +				ret);
>> +
>> +		/* If flush_cnt larger than 1, only need one more time flush */
>> +		if (atomic_dec_and_test(&hr_qp->flush_cnt))
>> +			atomic_set(&hr_qp->flush_cnt, 1);
>> +	}
> 
> And this while loop is just 

There is a bug here, the code should be:
if (!atomic_dec_and_test(&hr_qp->flush_cnt))
	atomic_set(&hr_qp->flush_cnt, 1);

It merges all further flush operation requirements into only one more time flush,
that is, do the loop once again if flush_cnt larger than 1.

> 
> if (atomic_xchg(&hr_qp->flush_cnt, 0)) {
>   [..]
> }

I think we can't use if instead of while loop.
Our current solution can merge all further flush requirements after the inflection point
(the place of reading PI of SQ and RQ in hns_roce_modify_qp) into only one more time flush.
That is, the flush_cnt can be changed again by post send/recv at any place of the implementation
of hns_roce_modify_qp. We need one more time flush to update the PI of SQ and RQ.

With your solution, when user posts a new wr during the implementation of [...] in if condition,
it will re-queue a new init_flush_work, which will lead to a multiple call problem as we discussed
in v2.

> 
> I'm not even sure this needs to be a counter, all you need is set_bit()
> and test_and_clear()

We need the value of flush_cnt large than 1 to record further flush requirements, that's why
flush_cnt can be defined as a flag or bit value.





[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux