Re: Kernel fast memory registration API proposal [RFC]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Jul 15, 2015, at 4:01 AM, Sagi Grimberg <sagig@xxxxxxxxxxxxxxxxxx> wrote:

> On 7/14/2015 8:09 PM, Jason Gunthorpe wrote:
>> On Tue, Jul 14, 2015 at 07:55:39PM +0300, Sagi Grimberg wrote:
>> 
>>> But, if people think that it's better to have an API that does implicit
>>> posting always without notification, and then silently consume error or
>>> flush completions. I can try and look at it as well.
>> 
>> Can we do FMR transparently if we bundle the post? If yes, I'd call
>> that a winner..
> 
> Doing FMR transparently is not possible as the unmap flow is scheduling.
> Unlike NFS, iSER unmaps from a soft-IRQ context, SRP unmaps from
> hard-IRQ context.

The context in which RPC/RDMA performs FMR unmap mustn’t sleep.
RPC/RDMA is in roughly the same situation as the other initiators.


> Changing the context to thread context is not
> acceptable. The best we can do is using FMR_POOLs transparently.
> Other than polluting the API and its semantics I suspect people will
> have other problems with it (leaving the MRs open).

Count me in that group.

I would rather not build a non-deterministic delay into the
unmap interface. Using a pool or having map do an implicit
unmap are both solutions I’d rather avoid.

In both situations, MRs can be left mapped indefinitely if,
say, the workload pauses.


> I suggest to start with what I proposed. And in a later stage (if we
> still think its needed) we can have a higher level API that hides the
> post, something like:

> rdma_reg_sg(struct ib_qp *qp,
>            struct ib_mr *mr,
>            struct scatterlist *sg,
>            int sg_nents,
>            u64 offset,
>            u64 length,
>            int access_flags)

I still wonder what “length” means in the context of a scatterlist.


> rdma_unreg_mr(struct ib_qp *qp,
>              struct ib_mr *mr)

An implicit caveat to using this is that the ULP would have to
ensure the “qp” parameter is not NULL and that the referenced
QP will not be destroyed during this call.

So these calls have to be serialized with transport connect and
device removal.

The philosophical preference would be that the API should take
care of this itself, but I’m not smart enough to see how that
can be done.


> Or incorporate that with a pool API, something like:

FRWR does not need a pool. I’d rather not burden this API
with what is essentially an FMR workaround that introduces a
non-deterministic exposure of the data in each MR.


> rdma_create_fr_pool(struct ib_qp *qp,
>                    int nmrs,
>                    int mr_size,
>                    int create_flags)
> 
> rdma_destroy_fr_pool(struct rdma_fr_pool *pool)
> 
> rdma_fr_reg_sg(struct rdma_fr_pool *pool,
>               struct scatterlist *sg,
>               int sg_nents,
>               u64 offset,
>               u64 length,
>               int access_flags)
> 
> rdma_fr_unreg_mr(struct rdma_fr_pool *pool,
>                 struct ib_mr *mr)
> 
> 
> Note that I expect problems with both approaches, but
> we can look into it...
> 
> Sagi.

--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux