RE: [RFC rdma-core 2/2] verbs: Introduce non-contiguous memory registration

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Alex Margolin
> Sent: Thursday, January 25, 2018 2:43 PM
> To: 'Yuval Shaia' <yuval.shaia@xxxxxxxxxx>; Marcel Apfelbaum
> <marcel@xxxxxxxxxx>
> Cc: Jason Gunthorpe <jgg@xxxxxxxxxxxx>; linux-rdma@xxxxxxxxxxxxxxx
> Subject: RE: [RFC rdma-core 2/2] verbs: Introduce non-contiguous memory
> registration
> 
> 
> 
> > -----Original Message-----
> > From: Yuval Shaia [mailto:yuval.shaia@xxxxxxxxxx]
> > Sent: Tuesday, January 23, 2018 10:30 PM
> > To: Alex Margolin <alexma@xxxxxxxxxxxx>; Marcel Apfelbaum
> > <marcel@xxxxxxxxxx>
> > Cc: Jason Gunthorpe <jgg@xxxxxxxxxxxx>; linux-rdma@xxxxxxxxxxxxxxx
> > Subject: Re: [RFC rdma-core 2/2] verbs: Introduce non-contiguous
> > memory registration
> >
> > On Mon, Jan 22, 2018 at 03:59:51PM +0000, Alex Margolin wrote:
> > > > -----Original Message-----
> > > > From: Jason Gunthorpe
> > > > Sent: Thursday, January 11, 2018 6:45 PM
> > > > To: Yuval Shaia <yuval.shaia@xxxxxxxxxx>
> > > > Cc: Alex Margolin <alexma@xxxxxxxxxxxx>;
> > > > linux-rdma@xxxxxxxxxxxxxxx
> > > > Subject: Re: [RFC rdma-core 2/2] verbs: Introduce non-contiguous
> > > > memory registration
> > > >
> > > > On Thu, Jan 11, 2018 at 02:22:07PM +0200, Yuval Shaia wrote:
> > > > > > +The following code example demonstrates non-contiguous memory
> > > > > > +registration, by combining two contiguous regions, along with
> > > > > > +the
> > > > WR-based completion semantic:
> > > > > > +.PP
> > > > > > +.nf
> > > > > > +mr1 = ibv_reg_mr(pd, addr1, len1, 0); if (!mr1) {
> > > > > > +        fprintf(stderr, "Failed to create MR #1\en");
> > > > > > +        return 1;
> > > > > > +}
> > > > > > +
> > > > > > +mr2 = ibv_reg_mr(pd, addr2, len2, 0); if (!mr2) {
> > > > > > +        fprintf(stderr, "Failed to create MR #2\en");
> > > > > > +        return 1;
> > > > > > +}
> > > > >
> > > > > So, to register non-contiguous 512 random buffers i would have
> > > > > to create
> > > > > 512 MRs?
> > >
> > >
> > > I think typically if you have a large amount of buffers - it would
> > > be
> > located in fairly close proximity, so you'd prefer one MR to cover all
> > of them and the SGEs will only differ in base address.
> >
> > Define "large amount".
> > I did several experiments with something like hundred or few hundred
> > (Marcel, do you remember how many?) and they were scattered at the
> > range of about 3G so one MR is not an option. Our application is QEMU
> > so 3G for one MR means no memory overcommit.
> >
> > >
> > > Are you proposing the function also replaces ibv_reg_mr() if the
> > > user
> > passes multiple unregistered regions?
> > > I could see the benefit, but then we'd require additional parameters
> > (i.e. send_flags) and those MRs couldn't be reused (otherwise need to
> > add output pointers for resulting MRs).
> 
> Actually, I realized it can be implemented with the proposed API.
> All that is missing is a capability bit and a flag for set_layout_*, and
> the implementation could work as follows (changes relative to SG
> example):
> 
> +assert(caps & IBV_MR_SET_LAYOUT_INTERNAL_REGISTRATION);
> -mr1 = ibv_reg_mr(pd, addr1, len1, 0);
> -if (!mr1) {
> -        fprintf(stderr, "Failed to create MR #1\en");
> -        return 1;
> -}
> -
> -mr2 = ibv_reg_mr(pd, addr2, len2, 0);
> -if (!mr2) {
> -        fprintf(stderr, "Failed to create MR #2\en");
> -        return 1;
> -}
> 
> mr3 = ibv_reg_mr(pd, NULL, 0, IBV_ACCESS_ZERO_BASED); if (!mr3) {
>         fprintf(stderr, "Failed to create result MR\en");
>         return 1;
> }
> 
> struct ibv_sge composite[] =
> {
>         {
>                 .addr = addr1,
>                 .length = len1,
> -                .lkey = mr1->lkey
>         },
>         {
>                 .addr = addr2,
>                 .length = len2,
> -                .lkey = mr2->lkey
>         }
> };
> 
> +ret = ibv_mr_set_layout_sg(mr3, IBV_MR_SET_LAYOUT_REGISTER_BUFFERS, 2,
> +composite);
> -ret = ibv_mr_set_layout_sg(mr3, 0, 2, composite); if (ret) {
>         fprintf(stderr, "Non-contiguous registration failed\en");
>         return 1;
> }
> 
> In this case calling ibv_mr_set_layout_sg() will cause an internal
> registration replacing the ibv_reg_mr calls for mr1 and mr2, and the
> registration will be stored in mr3.

Forgot to add - MR creation parameters, such as access flags, will be taken from mr3 reg_mr call.

> 
> Is this what you had in mind?
> 
> >
> > Yeah, more or less the same ib_reg_mr but one that gets list of pages
> > instead of virtual address and will skip the "while (npages)" loop in
> > ib_umem_get and just go directly to dma_map_sg. Idea here is that
> > anyway the HW supports scattered list of buffers so why to limit the
> > API to contiguous virtual address.
> >
> > We dropped this idea as it turns out that we need extra help from the
> > HW in post_send phase where the virtual address received in the SGE
> > refers to the virtual address given at ib_reg_mr.
> > We somehow believed that zero-based-mr will solve this by maybe
> > allowing addresses in SGE to be something like an index to a entry in
> > the page- list given to ib_reg_mr but apparently zero-based-mr is not
> > yet functional (at least not in CX3).
> > (We have lack of knowledge in what exactly zero-based-mr is).
> >
> > > The benefit will probably not be latency, though, since IIRC the MR
> > creation can't really be parallelized.
> > > Yuval - are you aware of a scenario implementing a high amount of
> > ibv_reg_mr() calls?
> >
> > High amount of ibv_reg_mr calls no but i have a scenario where my
> > application can potentially receive request to create MR for 262144
> > scattered pages.
> > By the way, using the suggested API from Jason below, SG list will
> > still limits us, not sure how big SG list can be but sure not 262144.
> > So what we were thinking is to give ib_reg_mr a huge range, even 4G
> > but then use a bitmap parameter that will specify only the pages in
> > that range that take part in the MR.
> >
> > >
> > > >
> > > > That is a fair point - I wonder if some of these API should have
> > > > an option to accept a pointer directly? Maybe the driver requires
> > > > a MR but we don't need that as an the API?
> > > >
> > > > Particularly the _sg one..
> > > >
> > > > Jason
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-
> rdma"
> > > in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo
> > > info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux