On Tue, May 05, 2015 at 04:57:55PM -0400, Tom Talpey wrote: > Actually, I strongly disagree that the in-kernel consumers want to > register a struct page. They want to register a list of pages, often > a rather long one. They want this because it allows the RDMA layer to > address the list with a single memory handle. This is where things > get tricky. Yes, I agree - my wording was wrong and if you look at the next point it should be obvious that I meant multiple struct pages. > So the "pinned" or "wired" term is because in order to do RDMA, the > page needs to have a fixed mapping to this handle. Usually, that means > a physical address. There are some new approaches that allow the NIC > to raise a fault and/or walk kernel page tables, but one way or the > other the page had better be resident. RDMA NICs, generally speaking, > don't buffer in-flight RDMA data, nor do you want them to. But that whole painpoint only existist for userspace ib verbs consumers. And in-kernel consumer fits into the "pinned" or "wired" categegory, as any local DMA requires it. > > - In many but not all cases we might need an offset/length for each > > page (think struct bvec, paged sk_buffs, or scatterlists of some > > sort), in other an offset/len for the whole set of pages is fine, > > but that's a superset of the one above. > > Yep, RDMA calls this FBO and length, and further, the protocol requires > that the data itself be contiguous within the registration, that is, the > FBO can be non-zero, but no other holes be present. The contiguous requirements isn't something we can alway guarantee. While a lot of I/O will have that form the form where there are holes can happen, although it's not common. > > - we usually want it to be as fast as possible > > In the case of file protocols such as NFS/RDMA and SMB Direct, as well > as block protocols such as iSER, these registrations are set up and > torn down on a per-I/O basis, in order to protect the data from > misbehaving peers or misbehaving hardware. So to me as a storage > protocol provider, "usually" means "always". Yes. As I said I haven't actually found anything yet that doesn't fit the pattern, but the RDMA in-kernel API is such a mess that I didn't want to put my hand in the fire and say always. > I totally get where you're coming from, my main question is whether > it's possible to nail the requirements of some useful common API. > It has been tried before, shall I say. Do you have any information on these attempts and why the failed? Note that the only interesting ones would be for in-kernel consumers. Userspace verbs are another order of magnitude more problems, so they're not too interesting. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html