Re: [PATCH for-next v3 6/6] RDMA/rxe: Replace rxe_map and rxe_phys_buf by xarray

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/15/23 20:21, lizhijian@xxxxxxxxxxx wrote:
> 
> 
> On 14/01/2023 07:27, Bob Pearson wrote:
>> Replace struct rxe-phys_buf and struct rxe_map by struct xarray
>> in rxe_verbs.h. This allows using rcu locking on reads for
>> the memory maps stored in each mr.
>>
>> This is based off of a sketch of a patch from Jason Gunthorpe in the
>> link below. Some changes were needed to make this work. It applies
>> cleanly to the current for-next and passes the pyverbs, perftest
>> and the same blktests test cases which run today.
>>
>> Link:https://lore.kernel.org/linux-rdma/Y3gvZr6%2FNCii9Avy@xxxxxxxxxx/
>> Co-developed-by: Jason Gunthorpe<jgg@xxxxxxxxxx>
>> Signed-off-by: Bob Pearson<rpearsonhpe@xxxxxxxxx>
>> ---
>>   drivers/infiniband/sw/rxe/rxe_loc.h   |   1 -
>>   drivers/infiniband/sw/rxe/rxe_mr.c    | 533 ++++++++++++--------------
>>   drivers/infiniband/sw/rxe/rxe_resp.c  |   2 +-
>>   drivers/infiniband/sw/rxe/rxe_verbs.h |  21 +-
>>   4 files changed, 251 insertions(+), 306 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
>> index fd70c71a9e4e..0cf78f9bb27c 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_loc.h
>> +++ b/drivers/infiniband/sw/rxe/rxe_loc.h
>> @@ -71,7 +71,6 @@ int copy_data(struct rxe_pd *pd, int access, struct rxe_dma_info *dma,
>>   	      void *addr, int length, enum rxe_mr_copy_dir dir);
>>   int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg,
>>   		  int sg_nents, unsigned int *sg_offset);
>> -void *iova_to_vaddr(struct rxe_mr *mr, u64 iova, int length);
>>   int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode,
>>   			u64 compare, u64 swap_add, u64 *orig_val);
>>   int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, void *addr);
>> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
>> index fd5537ee7f04..e4634279080a 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
> [snip...]
> 
>> -static bool is_pmem_page(struct page *pg)
>> +static unsigned long rxe_mr_iova_to_index(struct rxe_mr *mr, u64 iova)
>>   {
>> -	unsigned long paddr = page_to_phys(pg);
>> +	return (iova >> mr->page_shift) - (mr->ibmr.iova >> mr->page_shift);
>> +}
>>   
>> -	return REGION_INTERSECTS ==
>> -	       region_intersects(paddr, PAGE_SIZE, IORESOURCE_MEM,
>> -				 IORES_DESC_PERSISTENT_MEMORY);
> 
> [snip...]
> 
>> +	rxe_mr_init(access, mr);
>> +
>>   	umem = ib_umem_get(&rxe->ib_dev, start, length, access);
>>   	if (IS_ERR(umem)) {
>>   		rxe_dbg_mr(mr, "Unable to pin memory region err = %d\n",
>>   			(int)PTR_ERR(umem));
>> -		err = PTR_ERR(umem);
>> -		goto err_out;
>> +		return PTR_ERR(umem);
>>   	}
>>   
>> -	num_buf = ib_umem_num_pages(umem);
>> -
>> -	rxe_mr_init(access, mr);
>> -
>> -	err = rxe_mr_alloc(mr, num_buf);
>> +	err = rxe_mr_fill_pages_from_sgt(mr, &umem->sgt_append.sgt);
>>   	if (err) {
>> -		rxe_dbg_mr(mr, "Unable to allocate memory for map\n");
>> -		goto err_release_umem;
>> +		ib_umem_release(umem);
>> +		return err;
>>   	}
>>   
>> -	num_buf			= 0;
>> -	map = mr->map;
>> -	if (length > 0) {
>> -		bool persistent_access = access & IB_ACCESS_FLUSH_PERSISTENT;
>> -
>> -		buf = map[0]->buf;
>> -		for_each_sgtable_page (&umem->sgt_append.sgt, &sg_iter, 0) {
>> -			struct page *pg = sg_page_iter_page(&sg_iter);
>> +	mr->umem = umem;
>> +	mr->ibmr.type = IB_MR_TYPE_USER;
>> +	mr->state = RXE_MR_STATE_VALID;
>>   
>> -			if (persistent_access && !is_pmem_page(pg)) {
>> -				rxe_dbg_mr(mr, "Unable to register persistent access to non-pmem device\n");
>> -				err = -EINVAL;
>> -				goto err_release_umem;
>> -			}
> 
> I read you removed is_pmem_page and its checking, but there is no description about this.
> IMO, it's required by IBA spec.
> 
> Thanks
> Zhijian

Zhijian,

It was dropped by accident. I just posted an update with the calls put back. Please take a
look and if possible can you test to see if it doesn't regress your pmem changes. I have no
way to test pmem. Also look at a comment I added to the pmem flush call. I am suspicious of
the smp_store_release() call based on the original comment.

Regards,

Bob



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux