Re: [PATCH v2 rdma-core] verbs: Introduce a new reg_mr API for virtual address space

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 29, 2019 at 03:51:32PM +0300, Yuval Shaia wrote:
> The virtual address that is registered is used as a base for any address
> passed later in post_recv and post_send operations.
> 
> On a virtualized environment this is not correct.
> 
> A guest cannot register its memory so hypervisor maps the guest physical
> address to a host virtual address and register it with the HW. Later on,
> at datapath phase, the guest fills the SGEs with addresses from its
> address space.
> 
> Since HW cannot access guest virtual address space an extra translation
> is needed to map those addresses to be based on the host virtual address
> that was registered with the HW.
> This datapath interference affects performances.
> 
> To avoid this, a logical separation between the address that is
> registered and the address that is used as a offset at datapath phase is
> needed.
> This separation is already implemented in the lower layer part
> (ibv_cmd_reg_mr) but blocked at the API level.
> 
> Fix it by introducing a new API function that accepts an address from
> guest virtual address space as well to be used as offset for later
> datapath operations.
> 
> Signed-off-by: Yuval Shaia <yuval.shaia@xxxxxxxxxx>
> v0 -> v1: 
> 	* Change reg_mr callback signature instead of adding new callback
> 	* Add the new API to libibverbs/libibverbs.map.in
> v1 -> v2: 
> 	* Do not modify reg_mr signature for version 1.0 
> 	* Add note to man page
>  libibverbs/driver.h               |  2 +-
>  libibverbs/dummy_ops.c            |  2 +-
>  libibverbs/libibverbs.map.in      |  1 +
>  libibverbs/man/ibv_reg_mr.3       | 15 +++++++++++++--
>  libibverbs/verbs.c                | 23 ++++++++++++++++++++++-
>  libibverbs/verbs.h                |  7 +++++++
>  providers/bnxt_re/verbs.c         |  6 +++---
>  providers/bnxt_re/verbs.h         |  2 +-
>  providers/cxgb3/iwch.h            |  4 ++--
>  providers/cxgb3/verbs.c           |  9 +++++----
>  providers/cxgb4/libcxgb4.h        |  4 ++--
>  providers/cxgb4/verbs.c           |  9 +++++----
>  providers/hfi1verbs/hfiverbs.h    |  4 ++--
>  providers/hfi1verbs/verbs.c       |  8 ++++----
>  providers/hns/hns_roce_u.h        |  2 +-
>  providers/hns/hns_roce_u_verbs.c  |  6 +++---
>  providers/i40iw/i40iw_umain.h     |  2 +-
>  providers/i40iw/i40iw_uverbs.c    |  8 ++++----
>  providers/ipathverbs/ipathverbs.h |  4 ++--
>  providers/ipathverbs/verbs.c      |  8 ++++----
>  providers/mlx4/mlx4.h             |  4 ++--
>  providers/mlx4/verbs.c            |  7 +++----
>  providers/mlx5/mlx5.h             |  4 ++--
>  providers/mlx5/verbs.c            |  7 +++----
>  providers/mthca/ah.c              |  3 ++-
>  providers/mthca/mthca.h           |  4 ++--
>  providers/mthca/verbs.c           |  6 +++---
>  providers/nes/nes_umain.h         |  2 +-
>  providers/nes/nes_uverbs.c        |  9 ++++-----
>  providers/ocrdma/ocrdma_main.h    |  2 +-
>  providers/ocrdma/ocrdma_verbs.c   | 10 ++++------
>  providers/qedr/qelr_main.h        |  2 +-
>  providers/qedr/qelr_verbs.c       | 11 ++++-------
>  providers/qedr/qelr_verbs.h       |  4 ++--
>  providers/rxe/rxe.c               |  6 +++---
>  providers/vmw_pvrdma/pvrdma.h     |  4 ++--
>  providers/vmw_pvrdma/verbs.c      |  7 +++----
>  37 files changed, 126 insertions(+), 92 deletions(-)
> 
> diff --git a/libibverbs/driver.h b/libibverbs/driver.h
> index e4d624b2..ef27259a 100644
> +++ b/libibverbs/driver.h
> @@ -338,7 +338,7 @@ struct verbs_context_ops {
>  				    uint64_t dm_offset, size_t length,
>  				    unsigned int access);
>  	struct ibv_mr *(*reg_mr)(struct ibv_pd *pd, void *addr, size_t length,
> -				 int access);
> +				 uint64_t hca_va, int access);
>  	int (*req_notify_cq)(struct ibv_cq *cq, int solicited_only);
>  	int (*rereg_mr)(struct verbs_mr *vmr, int flags, struct ibv_pd *pd,
>  			void *addr, size_t length, int access);
> diff --git a/libibverbs/dummy_ops.c b/libibverbs/dummy_ops.c
> index c861c3a0..61a8fbdf 100644
> +++ b/libibverbs/dummy_ops.c
> @@ -410,7 +410,7 @@ static struct ibv_mr *reg_dm_mr(struct ibv_pd *pd, struct ibv_dm *dm,
>  }
>  
>  static struct ibv_mr *reg_mr(struct ibv_pd *pd, void *addr, size_t length,
> -			     int access)
> +			     uint64_t hca_va,  int access)
>  {
>  	errno = ENOSYS;
>  	return NULL;
> diff --git a/libibverbs/libibverbs.map.in b/libibverbs/libibverbs.map.in
> index 87a1b9fc..523fd424 100644
> +++ b/libibverbs/libibverbs.map.in
> @@ -94,6 +94,7 @@ IBVERBS_1.1 {
>  		ibv_query_srq;
>  		ibv_rate_to_mbps;
>  		ibv_reg_mr;
> +		ibv_reg_mr_virt_as;
>  		ibv_register_driver;
>  		ibv_rereg_mr;
>  		ibv_resize_cq;
> diff --git a/libibverbs/man/ibv_reg_mr.3 b/libibverbs/man/ibv_reg_mr.3
> index 631e5fe8..983b5761 100644
> +++ b/libibverbs/man/ibv_reg_mr.3
> @@ -3,7 +3,7 @@
>  .\"
>  .TH IBV_REG_MR 3 2006-10-31 libibverbs "Libibverbs Programmer's Manual"
>  .SH "NAME"
> -ibv_reg_mr, ibv_dereg_mr \- register or deregister a memory region (MR)
> +ibv_reg_mr, ibv_reg_mr_virt_as, ibv_dereg_mr \- register or deregister a memory region (MR)
>  .SH "SYNOPSIS"
>  .nf
>  .B #include <infiniband/verbs.h>
> @@ -11,6 +11,10 @@ ibv_reg_mr, ibv_dereg_mr \- register or deregister a memory region (MR)
>  .BI "struct ibv_mr *ibv_reg_mr(struct ibv_pd " "*pd" ", void " "*addr" ,
>  .BI "                          size_t " "length" ", int " "access" );
>  .sp
> +.BI "struct ibv_mr *ibv_reg_mr_virt_as(struct ibv_pd " "*pd" ", void " "*addr" ,
> +.BI "                                  size_t " "length" ", uint64_t " "hca_va" ,
> +.BI "                                  int " "access" );
> +.sp
>  .BI "int ibv_dereg_mr(struct ibv_mr " "*mr" );
>  .fi
>  .SH "DESCRIPTION"
> @@ -52,11 +56,18 @@ Local read access is always enabled for the MR.
>  .PP
>  To create an implicit ODP MR, IBV_ACCESS_ON_DEMAND should be set, addr should be 0 and length should be SIZE_MAX.
>  .PP
> +.B ibv_reg_mr_virt_as()
> +Special variant of memory registration used when addresses passed to
> +ibv_post_send and ibv_post_recv are relative to base address from a
> +different address space than
> +.I addr\fR. The argument
> +.I hca_va\fR is the new base address.
> +.PP

This should also block ACCESS_ZERO_BASED for mr_virt_as as ZERO_BASED
is really just hca_va == 0

In fact, I might be inclined to re-implement ZERO_BASED in rdma-core
as just passing hca_va == 0 which would make it work on all drivers
instead of the stupid implementation we have today..

Also, not totally sold on the 'ibv_reg_mr_virt_as' for the
name.. 

How about 'ibv_reg_mr_iova'? And let us call hca_va iova in the public
interface. I think that is an existing convention.

Jason



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux