Re: [PATCH v1 00/16] NFS/RDMA patches proposed for 4.1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 05, 2015 at 04:57:55PM -0400, Tom Talpey wrote:
> Actually, I strongly disagree that the in-kernel consumers want to
> register a struct page. They want to register a list of pages, often
> a rather long one. They want this because it allows the RDMA layer to
> address the list with a single memory handle. This is where things
> get tricky.

Yes, I agree - my wording was wrong and if you look at the next point
it should be obvious that I meant multiple struct pages.

> So the "pinned" or "wired" term is because in order to do RDMA, the
> page needs to have a fixed mapping to this handle. Usually, that means
> a physical address. There are some new approaches that allow the NIC
> to raise a fault and/or walk kernel page tables, but one way or the
> other the page had better be resident. RDMA NICs, generally speaking,
> don't buffer in-flight RDMA data, nor do you want them to.

But that whole painpoint only existist for userspace ib verbs consumers.
And in-kernel consumer fits into the "pinned" or "wired" categegory,
as any local DMA requires it.

> >  - In many but not all cases we might need an offset/length for each
> >    page (think struct bvec, paged sk_buffs, or scatterlists of some
> >    sort), in other an offset/len for the whole set of pages is fine,
> >    but that's a superset of the one above.
> 
> Yep, RDMA calls this FBO and length, and further, the protocol requires
> that the data itself be contiguous within the registration, that is, the
> FBO can be non-zero, but no other holes be present.

The contiguous requirements isn't something we can alway guarantee.
While a lot of I/O will have that form the form where there are holes
can happen, although it's not common.

> >  - we usually want it to be as fast as possible
> 
> In the case of file protocols such as NFS/RDMA and SMB Direct, as well
> as block protocols such as iSER, these registrations are set up and
> torn down on a per-I/O basis, in order to protect the data from
> misbehaving peers or misbehaving hardware. So to me as a storage
> protocol provider, "usually" means "always".

Yes.  As I said I haven't actually found anything yet that doesn't fit
the pattern, but the RDMA in-kernel API is such a mess that I didn't
want to put my hand in the fire and say always.

> I totally get where you're coming from, my main question is whether
> it's possible to nail the requirements of some useful common API.
> It has been tried before, shall I say.

Do you have any information on these attempts and why the failed?  Note
that the only interesting ones would be for in-kernel consumers.
Userspace verbs are another order of magnitude more problems, so they're
not too interesting.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux