On 7/8/2015 3:08 PM, Jason Gunthorpe wrote:
The MR stuff was never really designed, the HW people provided some capability and the SW side just raw exposed it, thoughtlessly.
Jason, I don't disagree that the API can be improved. I have some responses to your statements below though.
Why is code using iWarp calling ib_get_dma_mr with RDMA_MRR_READ_DEST/IB_ACCESS_REMOTE_WRITE ? That is insecure.
Because the iWARP protocol requires it, which was very much an intentional decision. It actually is not insecure, as discussed in detail in RFC5042. However, it is different from Infiniband.
Why on earth is NFS using frmr to create RDMA READ lkeys?
Because NFS desires to have a single code path that works for all transports. In addition, using the IB dma_mr as an lkey means that large I/O (common with NFS) would require multiple RDMA Read operations, when the page list exceeded the local scatter/gather limit of the device.
I think when you do that, it quickly becomes clear that iWarp's problem is not a seemingly minor issue with different flag bits, but that iWarp *cannot* use local_dma_lkey as a RDMA READ local buffer. Using ib_get_dma_mr(IB_ACCESS_REMOTE_WRITE) is an insecure work around. So iWarp (and only iWarp) should take the pain of spinning up temporary local MRs for RDMA READ.
That is entirely the wrong layer to address this. It would prevent reuse of MRs, and require upper layers to be aware that this was the case - which is exactly the opposite of what you are trying to achieve.
This should all be hidden under a common API and it shouldn't be sprinkled all over the place in NFS, iSER, etc.
Are you arguing that all upper layer storage protocols take a single common approach to memory registration? That is a very different discussion. Tom. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html