On Wed, Jul 15, 2015 at 12:32:33AM -0700, Christoph Hellwig wrote: > int rdma_create_mr(struct ib_pd *pd, enum rdma_mr_type mr, > u32 max_pages, int flags); > > > * array from a SG list > > * @mr: memory region > > * @sg: sg list > > * @sg_nents: number of elements in the sg > > * > > * Can fail if the HW is not able to register this > > * sg list. In case of failure - caller is responsible > > * to handle it (bounce-buffer, multiple registrations...) > > */ > > int ib_mr_set_sg(struct ib_mr *mr, > > struct scatterlist *sg, > > unsigned short sg_nents); > > Call this rdma_map_sg? > > > /* register the MR */ > > frwr.opcode = IB_WR_FAST_REG_MR; > > frwr.wrid = my_wrid; > > frwr.wr.fast_reg.mr = mr; > > frwr.wr.fast_reg.iova = ib_sg_dma_adress(&sg[0]); > > frwr.wr.fast_reg.length = length; > > frwr.wr.fast_reg.access_flags = my_flags; > > Provide a helper to hide all this behind the scenes please: > > void rdma_init_mr_wr(struct ib_send_wr *wr, struct rdma_mr *mr, > u64 wr_id, int mr_access_flags); > > Or if we got with Jason's suggestion split "int mr_access_flags" into > "bool remote, bool is_write". Yes please. Considering the security implications we need to be much more careful API wise here. This is more of a code-as-documentation issue than a functional issue. Lets avoid the issue of implicit posting, but still delegate the WR construction to the driver: rdma_map_sg_lkey(u32 *lkey_out, struct rdma_mr *mr, const struct scatterlist *sg, unsigned short sg_nents, unsigned int ops_supported, struct ib_send_wr *post_wr, u64 wrid) rdma_map_sg_rkey(.. same args ..) Used as: rdma_map_sg_lkey(&wr[1].lkey,mr,sg,sg_nents,RDMA_OP_SEND, &wr[0],...); wr[1].opcode = IB_SEND; ib_post_send(wr..) I'd probably implement the above as two wrappers around a generic driver/core callback. All the wrappers do is translation of ops_supports to the IB_ACCESS scheme. The two entry points interpret their ops_supported differently (source/sink in Steve's model), so it becomes impossible to inadvertantly create a remote access lkey. And the API makes it unnatural to use a MR as both a lkey and rkey. Just thinking .. I'd probably drop the wrid and have rdma_map_sg_x create an unsignaled wr by default. If the caller wants to force signaling they can fill in the write and change the flags. --------- I'm still unhappy with this, from a broad perspective. Look at what any ULP has to implement to do a RDMA READ properly: if (iwarp) { rdma_map_sg_lkey(&wr[1].lkey,mr,sg,sg_nents,RDMA_OP_RDMA_READ, &wr[0],...); wr[1].opcode = IB_RDMA_READ; wr[2].opcode = IB_WR_LOCAL_INV; wr[2].invalidate_rkey = wr[1].lkey; } else if (sg_nents <= device->read_sg_limit) { wr[0].opcode = IB_RDMA_READ; wr[0].lkey = pd->local_dma_lkey; ... translate struct sctatter_list to a wr ... } else if (fmr) { ... ? ... } } else { // For IB rdma_map_sg_lkey(&wr[1].lkey,mr,sg,sg_nents,RDMA_OP_RDMA_READ, &wr[0],...); wr[1].opcode = IB_RDMA_READ; // We can lazy invalidate when we recycle the MR (?) } And that isn't even considering the possibility of using multiple RDMA_READ ops instead of a MR. (which would be smarter that FMR) I see the above as a common, mandatory, pattern that should be factored.. If a ULP is doing RDMA_READ and it isn't doing the above, it is broken. It either doesn't support iWarp, or it is throwing IB performance away. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html