On Wed, May 29, 2019 at 03:51:32PM +0300, Yuval Shaia wrote: > The virtual address that is registered is used as a base for any address > passed later in post_recv and post_send operations. > > On a virtualized environment this is not correct. > > A guest cannot register its memory so hypervisor maps the guest physical > address to a host virtual address and register it with the HW. Later on, > at datapath phase, the guest fills the SGEs with addresses from its > address space. > > Since HW cannot access guest virtual address space an extra translation > is needed to map those addresses to be based on the host virtual address > that was registered with the HW. > This datapath interference affects performances. > > To avoid this, a logical separation between the address that is > registered and the address that is used as a offset at datapath phase is > needed. > This separation is already implemented in the lower layer part > (ibv_cmd_reg_mr) but blocked at the API level. > > Fix it by introducing a new API function that accepts an address from > guest virtual address space as well to be used as offset for later > datapath operations. > > Signed-off-by: Yuval Shaia <yuval.shaia@xxxxxxxxxx> > v0 -> v1: > * Change reg_mr callback signature instead of adding new callback > * Add the new API to libibverbs/libibverbs.map.in > v1 -> v2: > * Do not modify reg_mr signature for version 1.0 > * Add note to man page > libibverbs/driver.h | 2 +- > libibverbs/dummy_ops.c | 2 +- > libibverbs/libibverbs.map.in | 1 + > libibverbs/man/ibv_reg_mr.3 | 15 +++++++++++++-- > libibverbs/verbs.c | 23 ++++++++++++++++++++++- > libibverbs/verbs.h | 7 +++++++ > providers/bnxt_re/verbs.c | 6 +++--- > providers/bnxt_re/verbs.h | 2 +- > providers/cxgb3/iwch.h | 4 ++-- > providers/cxgb3/verbs.c | 9 +++++---- > providers/cxgb4/libcxgb4.h | 4 ++-- > providers/cxgb4/verbs.c | 9 +++++---- > providers/hfi1verbs/hfiverbs.h | 4 ++-- > providers/hfi1verbs/verbs.c | 8 ++++---- > providers/hns/hns_roce_u.h | 2 +- > providers/hns/hns_roce_u_verbs.c | 6 +++--- > providers/i40iw/i40iw_umain.h | 2 +- > providers/i40iw/i40iw_uverbs.c | 8 ++++---- > providers/ipathverbs/ipathverbs.h | 4 ++-- > providers/ipathverbs/verbs.c | 8 ++++---- > providers/mlx4/mlx4.h | 4 ++-- > providers/mlx4/verbs.c | 7 +++---- > providers/mlx5/mlx5.h | 4 ++-- > providers/mlx5/verbs.c | 7 +++---- > providers/mthca/ah.c | 3 ++- > providers/mthca/mthca.h | 4 ++-- > providers/mthca/verbs.c | 6 +++--- > providers/nes/nes_umain.h | 2 +- > providers/nes/nes_uverbs.c | 9 ++++----- > providers/ocrdma/ocrdma_main.h | 2 +- > providers/ocrdma/ocrdma_verbs.c | 10 ++++------ > providers/qedr/qelr_main.h | 2 +- > providers/qedr/qelr_verbs.c | 11 ++++------- > providers/qedr/qelr_verbs.h | 4 ++-- > providers/rxe/rxe.c | 6 +++--- > providers/vmw_pvrdma/pvrdma.h | 4 ++-- > providers/vmw_pvrdma/verbs.c | 7 +++---- > 37 files changed, 126 insertions(+), 92 deletions(-) > > diff --git a/libibverbs/driver.h b/libibverbs/driver.h > index e4d624b2..ef27259a 100644 > +++ b/libibverbs/driver.h > @@ -338,7 +338,7 @@ struct verbs_context_ops { > uint64_t dm_offset, size_t length, > unsigned int access); > struct ibv_mr *(*reg_mr)(struct ibv_pd *pd, void *addr, size_t length, > - int access); > + uint64_t hca_va, int access); > int (*req_notify_cq)(struct ibv_cq *cq, int solicited_only); > int (*rereg_mr)(struct verbs_mr *vmr, int flags, struct ibv_pd *pd, > void *addr, size_t length, int access); > diff --git a/libibverbs/dummy_ops.c b/libibverbs/dummy_ops.c > index c861c3a0..61a8fbdf 100644 > +++ b/libibverbs/dummy_ops.c > @@ -410,7 +410,7 @@ static struct ibv_mr *reg_dm_mr(struct ibv_pd *pd, struct ibv_dm *dm, > } > > static struct ibv_mr *reg_mr(struct ibv_pd *pd, void *addr, size_t length, > - int access) > + uint64_t hca_va, int access) > { > errno = ENOSYS; > return NULL; > diff --git a/libibverbs/libibverbs.map.in b/libibverbs/libibverbs.map.in > index 87a1b9fc..523fd424 100644 > +++ b/libibverbs/libibverbs.map.in > @@ -94,6 +94,7 @@ IBVERBS_1.1 { > ibv_query_srq; > ibv_rate_to_mbps; > ibv_reg_mr; > + ibv_reg_mr_virt_as; > ibv_register_driver; > ibv_rereg_mr; > ibv_resize_cq; > diff --git a/libibverbs/man/ibv_reg_mr.3 b/libibverbs/man/ibv_reg_mr.3 > index 631e5fe8..983b5761 100644 > +++ b/libibverbs/man/ibv_reg_mr.3 > @@ -3,7 +3,7 @@ > .\" > .TH IBV_REG_MR 3 2006-10-31 libibverbs "Libibverbs Programmer's Manual" > .SH "NAME" > -ibv_reg_mr, ibv_dereg_mr \- register or deregister a memory region (MR) > +ibv_reg_mr, ibv_reg_mr_virt_as, ibv_dereg_mr \- register or deregister a memory region (MR) > .SH "SYNOPSIS" > .nf > .B #include <infiniband/verbs.h> > @@ -11,6 +11,10 @@ ibv_reg_mr, ibv_dereg_mr \- register or deregister a memory region (MR) > .BI "struct ibv_mr *ibv_reg_mr(struct ibv_pd " "*pd" ", void " "*addr" , > .BI " size_t " "length" ", int " "access" ); > .sp > +.BI "struct ibv_mr *ibv_reg_mr_virt_as(struct ibv_pd " "*pd" ", void " "*addr" , > +.BI " size_t " "length" ", uint64_t " "hca_va" , > +.BI " int " "access" ); > +.sp > .BI "int ibv_dereg_mr(struct ibv_mr " "*mr" ); > .fi > .SH "DESCRIPTION" > @@ -52,11 +56,18 @@ Local read access is always enabled for the MR. > .PP > To create an implicit ODP MR, IBV_ACCESS_ON_DEMAND should be set, addr should be 0 and length should be SIZE_MAX. > .PP > +.B ibv_reg_mr_virt_as() > +Special variant of memory registration used when addresses passed to > +ibv_post_send and ibv_post_recv are relative to base address from a > +different address space than > +.I addr\fR. The argument > +.I hca_va\fR is the new base address. > +.PP This should also block ACCESS_ZERO_BASED for mr_virt_as as ZERO_BASED is really just hca_va == 0 In fact, I might be inclined to re-implement ZERO_BASED in rdma-core as just passing hca_va == 0 which would make it work on all drivers instead of the stupid implementation we have today.. Also, not totally sold on the 'ibv_reg_mr_virt_as' for the name.. How about 'ibv_reg_mr_iova'? And let us call hca_va iova in the public interface. I think that is an existing convention. Jason