On Wed, May 29, 2019 at 01:42:31PM -0300, Jason Gunthorpe wrote: > On Wed, May 29, 2019 at 03:51:32PM +0300, Yuval Shaia wrote: > > The virtual address that is registered is used as a base for any address > > passed later in post_recv and post_send operations. > > > > On a virtualized environment this is not correct. > > > > A guest cannot register its memory so hypervisor maps the guest physical > > address to a host virtual address and register it with the HW. Later on, > > at datapath phase, the guest fills the SGEs with addresses from its > > address space. > > > > Since HW cannot access guest virtual address space an extra translation > > is needed to map those addresses to be based on the host virtual address > > that was registered with the HW. > > This datapath interference affects performances. > > > > To avoid this, a logical separation between the address that is > > registered and the address that is used as a offset at datapath phase is > > needed. > > This separation is already implemented in the lower layer part > > (ibv_cmd_reg_mr) but blocked at the API level. > > > > Fix it by introducing a new API function that accepts an address from > > guest virtual address space as well to be used as offset for later > > datapath operations. > > > > Signed-off-by: Yuval Shaia <yuval.shaia@xxxxxxxxxx> > > v0 -> v1: > > * Change reg_mr callback signature instead of adding new callback > > * Add the new API to libibverbs/libibverbs.map.in > > v1 -> v2: > > * Do not modify reg_mr signature for version 1.0 > > * Add note to man page > > libibverbs/driver.h | 2 +- > > libibverbs/dummy_ops.c | 2 +- > > libibverbs/libibverbs.map.in | 1 + > > libibverbs/man/ibv_reg_mr.3 | 15 +++++++++++++-- > > libibverbs/verbs.c | 23 ++++++++++++++++++++++- > > libibverbs/verbs.h | 7 +++++++ > > providers/bnxt_re/verbs.c | 6 +++--- > > providers/bnxt_re/verbs.h | 2 +- > > providers/cxgb3/iwch.h | 4 ++-- > > providers/cxgb3/verbs.c | 9 +++++---- > > providers/cxgb4/libcxgb4.h | 4 ++-- > > providers/cxgb4/verbs.c | 9 +++++---- > > providers/hfi1verbs/hfiverbs.h | 4 ++-- > > providers/hfi1verbs/verbs.c | 8 ++++---- > > providers/hns/hns_roce_u.h | 2 +- > > providers/hns/hns_roce_u_verbs.c | 6 +++--- > > providers/i40iw/i40iw_umain.h | 2 +- > > providers/i40iw/i40iw_uverbs.c | 8 ++++---- > > providers/ipathverbs/ipathverbs.h | 4 ++-- > > providers/ipathverbs/verbs.c | 8 ++++---- > > providers/mlx4/mlx4.h | 4 ++-- > > providers/mlx4/verbs.c | 7 +++---- > > providers/mlx5/mlx5.h | 4 ++-- > > providers/mlx5/verbs.c | 7 +++---- > > providers/mthca/ah.c | 3 ++- > > providers/mthca/mthca.h | 4 ++-- > > providers/mthca/verbs.c | 6 +++--- > > providers/nes/nes_umain.h | 2 +- > > providers/nes/nes_uverbs.c | 9 ++++----- > > providers/ocrdma/ocrdma_main.h | 2 +- > > providers/ocrdma/ocrdma_verbs.c | 10 ++++------ > > providers/qedr/qelr_main.h | 2 +- > > providers/qedr/qelr_verbs.c | 11 ++++------- > > providers/qedr/qelr_verbs.h | 4 ++-- > > providers/rxe/rxe.c | 6 +++--- > > providers/vmw_pvrdma/pvrdma.h | 4 ++-- > > providers/vmw_pvrdma/verbs.c | 7 +++---- > > 37 files changed, 126 insertions(+), 92 deletions(-) > > > > diff --git a/libibverbs/driver.h b/libibverbs/driver.h > > index e4d624b2..ef27259a 100644 > > +++ b/libibverbs/driver.h > > @@ -338,7 +338,7 @@ struct verbs_context_ops { > > uint64_t dm_offset, size_t length, > > unsigned int access); > > struct ibv_mr *(*reg_mr)(struct ibv_pd *pd, void *addr, size_t length, > > - int access); > > + uint64_t hca_va, int access); > > int (*req_notify_cq)(struct ibv_cq *cq, int solicited_only); > > int (*rereg_mr)(struct verbs_mr *vmr, int flags, struct ibv_pd *pd, > > void *addr, size_t length, int access); > > diff --git a/libibverbs/dummy_ops.c b/libibverbs/dummy_ops.c > > index c861c3a0..61a8fbdf 100644 > > +++ b/libibverbs/dummy_ops.c > > @@ -410,7 +410,7 @@ static struct ibv_mr *reg_dm_mr(struct ibv_pd *pd, struct ibv_dm *dm, > > } > > > > static struct ibv_mr *reg_mr(struct ibv_pd *pd, void *addr, size_t length, > > - int access) > > + uint64_t hca_va, int access) > > { > > errno = ENOSYS; > > return NULL; > > diff --git a/libibverbs/libibverbs.map.in b/libibverbs/libibverbs.map.in > > index 87a1b9fc..523fd424 100644 > > +++ b/libibverbs/libibverbs.map.in > > @@ -94,6 +94,7 @@ IBVERBS_1.1 { > > ibv_query_srq; > > ibv_rate_to_mbps; > > ibv_reg_mr; > > + ibv_reg_mr_virt_as; > > ibv_register_driver; > > ibv_rereg_mr; > > ibv_resize_cq; > > diff --git a/libibverbs/man/ibv_reg_mr.3 b/libibverbs/man/ibv_reg_mr.3 > > index 631e5fe8..983b5761 100644 > > +++ b/libibverbs/man/ibv_reg_mr.3 > > @@ -3,7 +3,7 @@ > > .\" > > .TH IBV_REG_MR 3 2006-10-31 libibverbs "Libibverbs Programmer's Manual" > > .SH "NAME" > > -ibv_reg_mr, ibv_dereg_mr \- register or deregister a memory region (MR) > > +ibv_reg_mr, ibv_reg_mr_virt_as, ibv_dereg_mr \- register or deregister a memory region (MR) > > .SH "SYNOPSIS" > > .nf > > .B #include <infiniband/verbs.h> > > @@ -11,6 +11,10 @@ ibv_reg_mr, ibv_dereg_mr \- register or deregister a memory region (MR) > > .BI "struct ibv_mr *ibv_reg_mr(struct ibv_pd " "*pd" ", void " "*addr" , > > .BI " size_t " "length" ", int " "access" ); > > .sp > > +.BI "struct ibv_mr *ibv_reg_mr_virt_as(struct ibv_pd " "*pd" ", void " "*addr" , > > +.BI " size_t " "length" ", uint64_t " "hca_va" , > > +.BI " int " "access" ); > > +.sp > > .BI "int ibv_dereg_mr(struct ibv_mr " "*mr" ); > > .fi > > .SH "DESCRIPTION" > > @@ -52,11 +56,18 @@ Local read access is always enabled for the MR. > > .PP > > To create an implicit ODP MR, IBV_ACCESS_ON_DEMAND should be set, addr should be 0 and length should be SIZE_MAX. > > .PP > > +.B ibv_reg_mr_virt_as() > > +Special variant of memory registration used when addresses passed to > > +ibv_post_send and ibv_post_recv are relative to base address from a > > +different address space than > > +.I addr\fR. The argument > > +.I hca_va\fR is the new base address. > > +.PP > > This should also block ACCESS_ZERO_BASED for mr_virt_as as ZERO_BASED > is really just hca_va == 0 > > In fact, I might be inclined to re-implement ZERO_BASED in rdma-core > as just passing hca_va == 0 which would make it work on all drivers > instead of the stupid implementation we have today.. > > Also, not totally sold on the 'ibv_reg_mr_virt_as' for the > name.. I didn't liked it also, just because it came from virtualization perspective in mind i choose virt-address-space. > > How about 'ibv_reg_mr_iova'? And let us call hca_va iova in the public > interface. I think that is an existing convention. Sending v3. > > Jason