Re: [PATCH rdma-next 1/6] RDMA/umem: Combine contiguous PAGE_SIZE regions in SGEs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Dec 24, 2018 at 04:32:22PM -0600, Shiraz Saleem wrote:
> Combine contiguous regions of PAGE_SIZE pages
> into single scatter list entries while adding
> to the scatter table. This minimizes the number
> of the entries in the scatter list and reduces
> the DMA mapping overhead, particularly with the
> IOMMU.
> 
> Additionally, update driver call sites to use the
> for_each_sg_page variant to iterate through the
> pages in the SGE. In certain call sites, there is
> the assumption of single SGE to page mapping and
> hence the iterator variant is required. In others,
> it is simpler to use the iterator variant.
> 
> Suggested-by: Jason Gunthorpe <jgg@xxxxxxxx>
> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@xxxxxxxxx>
> Signed-off-by: Shiraz Saleem <shiraz.saleem@xxxxxxxxx>
> ---
>  drivers/infiniband/core/umem.c                 |  87 ++++++++---
>  drivers/infiniband/hw/bnxt_re/ib_verbs.c       |  21 +--
>  drivers/infiniband/hw/bnxt_re/qplib_res.c      |   9 +-
>  drivers/infiniband/hw/cxgb3/iwch_provider.c    |  27 ++--
>  drivers/infiniband/hw/cxgb4/mem.c              |  31 ++--
>  drivers/infiniband/hw/hns/hns_roce_hw_v1.c     |   7 +-
>  drivers/infiniband/hw/hns/hns_roce_hw_v2.c     |  25 ++-
>  drivers/infiniband/hw/hns/hns_roce_mr.c        |  88 +++++------
>  drivers/infiniband/hw/i40iw/i40iw_verbs.c      |  35 ++---
>  drivers/infiniband/hw/mthca/mthca_provider.c   |  33 ++--
>  drivers/infiniband/hw/nes/nes_verbs.c          | 203 +++++++++++--------------
>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.c    |  54 +++----
>  drivers/infiniband/hw/qedr/verbs.c             |  56 +++----
>  drivers/infiniband/hw/vmw_pvrdma/pvrdma_misc.c |  21 +--
>  drivers/infiniband/sw/rdmavt/mr.c              |   8 +-
>  drivers/infiniband/sw/rxe/rxe_mr.c             |   7 +-
>  16 files changed, 342 insertions(+), 370 deletions(-)

I think the driver parts of this should be split and done first, and
they should probably be split into per-driver patches so we can get
proper driver maintainer review. You need to CC the driver maintainers
on postings like this.

Most are generally an improvement, so just split and post them. They
can just go right away, independent of this series when ack'd.

> diff --git a/drivers/infiniband/hw/bnxt_re/qplib_res.c b/drivers/infiniband/hw/bnxt_re/qplib_res.c
> index 59eeac5..2ec668f 100644
> --- a/drivers/infiniband/hw/bnxt_re/qplib_res.c
> +++ b/drivers/infiniband/hw/bnxt_re/qplib_res.c
> @@ -85,7 +85,7 @@ static void __free_pbl(struct pci_dev *pdev, struct bnxt_qplib_pbl *pbl,
>  static int __alloc_pbl(struct pci_dev *pdev, struct bnxt_qplib_pbl *pbl,
>  		       struct scatterlist *sghead, u32 pages, u32 pg_size)
>  {
> -	struct scatterlist *sg;
> +	struct sg_page_iter sg_iter;
>  	bool is_umem = false;
>  	int i;
>  
> @@ -116,12 +116,13 @@ static int __alloc_pbl(struct pci_dev *pdev, struct bnxt_qplib_pbl *pbl,
>  	} else {
>  		i = 0;
>  		is_umem = true;
> -		for_each_sg(sghead, sg, pages, i) {
> -			pbl->pg_map_arr[i] = sg_dma_address(sg);
> -			pbl->pg_arr[i] = sg_virt(sg);
> +		for_each_sg_page(sghead, &sg_iter, pages, 0) {
> +			pbl->pg_map_arr[i] = sg_page_iter_dma_address(&sg_iter);
> +			pbl->pg_arr[i] = page_address(sg_page_iter_page(&sg_iter));
>  			if (!pbl->pg_arr[i])
>  				goto fail;

I wonder if pg_arr[] is big enough now? It wasn't easy to tell, and
there is no bounding on i..

> @@ -563,19 +562,15 @@ static struct ib_mr *iwch_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
>  
>  	i = n = 0;
>  
> -	for_each_sg(mhp->umem->sg_head.sgl, sg, mhp->umem->nmap, entry) {
> -			len = sg_dma_len(sg) >> shift;
> -			for (k = 0; k < len; ++k) {
> -				pages[i++] = cpu_to_be64(sg_dma_address(sg) +
> -							 (k << shift));
> -				if (i == PAGE_SIZE / sizeof *pages) {
> -					err = iwch_write_pbl(mhp, pages, i, n);
> -					if (err)
> -						goto pbl_done;
> -					n += i;
> -					i = 0;
> -				}
> -			}
> +	for_each_sg_page(mhp->umem->sg_head.sgl, &sg_iter, mhp->umem->nmap, 0) {
> +		pages[i++] = cpu_to_be64(sg_page_iter_dma_address(&sg_iter));
> +		if (i == PAGE_SIZE / sizeof *pages) {
> +			err = iwch_write_pbl(mhp, pages, i, n);
> +			if (err)
> +				goto pbl_done;
> +			n += i;
> +			i = 0;
> +		}
>  	}

Stuff like this is really an improvement, getting rid of the double
loop, lots of example of this in here..

> diff --git a/drivers/infiniband/sw/rdmavt/mr.c b/drivers/infiniband/sw/rdmavt/mr.c
> index 49c9541..8f02059 100644
> --- a/drivers/infiniband/sw/rdmavt/mr.c
> +++ b/drivers/infiniband/sw/rdmavt/mr.c
> @@ -381,8 +381,8 @@ struct ib_mr *rvt_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
>  {
>  	struct rvt_mr *mr;
>  	struct ib_umem *umem;
> -	struct scatterlist *sg;
> -	int n, m, entry;
> +	struct sg_page_iter sg_iter;
> +	int n, m;
>  	struct ib_mr *ret;
>  
>  	if (length == 0)
> @@ -411,10 +411,10 @@ struct ib_mr *rvt_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
>  	mr->mr.page_shift = umem->page_shift;
>  	m = 0;
>  	n = 0;
> -	for_each_sg(umem->sg_head.sgl, sg, umem->nmap, entry) {
> +	for_each_sg_page(umem->sg_head.sgl, &sg_iter, umem->nmap, 0) {
>  		void *vaddr;
>  
> -		vaddr = page_address(sg_page(sg));
> +		vaddr = page_address(sg_page_iter_page(&sg_iter));
>  		if (!vaddr) {
>  			ret = ERR_PTR(-EINVAL);
>  			goto bail_inval;

[..]
		mr->mr.map[m]->segs[n].length = BIT(umem->page_shift);

When converting to use sg_page make sure to strip out all the
umem->page_shift usages too - the length of the current sg_iter is
just PAGE_SIZE. (basically any driver but mlx5 using page_shift is a
mistake)

Should check with Dennis if rvt can handle variable page sizes here
and if so just correct it to use the length of the SGL entry not
page_shift.

> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
> index 9d3916b..a6937de 100644
> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
> @@ -162,11 +162,10 @@ int rxe_mem_init_user(struct rxe_pd *pd, u64 start,
>  		      u64 length, u64 iova, int access, struct ib_udata *udata,
>  		      struct rxe_mem *mem)
>  {
> -	int			entry;
>  	struct rxe_map		**map;
>  	struct rxe_phys_buf	*buf = NULL;
>  	struct ib_umem		*umem;
> -	struct scatterlist	*sg;
> +	struct sg_page_iter	sg_iter;
>  	int			num_buf;
>  	void			*vaddr;
>  	int err;
> @@ -199,8 +198,8 @@ int rxe_mem_init_user(struct rxe_pd *pd, u64 start,
>  	if (length > 0) {
>  		buf = map[0]->buf;
>  
> -		for_each_sg(umem->sg_head.sgl, sg, umem->nmap, entry) {
> -			vaddr = page_address(sg_page(sg));
> +		for_each_sg_page(umem->sg_head.sgl, &sg_iter, umem->nmap, 0) {
> +			vaddr = page_address(sg_page_iter_page(&sg_iter));
>  			if (!vaddr) {
>  				pr_warn("null vaddr\n");
>  				err = -ENOMEM;

Same comment

Jason



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux