Hi Roman and the team, On 02/02/2018 04:08 PM, Roman Pen wrote:
This series introduces IBNBD/IBTRS modules. IBTRS (InfiniBand Transport) is a reliable high speed transport library which allows for establishing connection between client and server machines via RDMA.
So its not strictly infiniband correct? It is optimized to transfer (read/write) IO blocks
in the sense that it follows the BIO semantics of providing the possibility to either write data from a scatter-gather list to the remote side or to request ("read") data transfer from the remote side into a given set of buffers. IBTRS is multipath capable and provides I/O fail-over and load-balancing functionality.
Couple of questions on your multipath implementation? 1. What was your main objective over dm-multipath? 2. What was the consideration of this implementation over creating a stand-alone bio based device node to reinject the bio to the original block device?
IBNBD (InfiniBand Network Block Device) is a pair of kernel modules (client and server) that allow for remote access of a block device on the server over IBTRS protocol. After being mapped, the remote block devices can be accessed on the client side as local block devices. Internally IBNBD uses IBTRS as an RDMA transport library. Why? - IBNBD/IBTRS is developed in order to map thin provisioned volumes, thus internal protocol is simple and consists of several request types only without awareness of underlaying hardware devices.
Can you explain how the protocol is developed for thin-p? What are the essence of how its suited for it?
- IBTRS was developed as an independent RDMA transport library, which supports fail-over and load-balancing policies using multipath, thus it can be used for any other IO needs rather than only for block device.
What do you mean by "any other IO"?
- IBNBD/IBTRS is faster than NVME over RDMA. Old comparison results: https://www.spinics.net/lists/linux-rdma/msg48799.html (I retested on latest 4.14 kernel - there is no any significant difference, thus I post the old link).
That is interesting to learn. Reading your reference brings a couple of questions though, - Its unclear to me how ibnbd performs reads without performing memory registration. Is it using the global dma rkey? - Its unclear to me how there is a difference in noreg in writes, because for small writes nvme-rdma never register memory (it uses inline data). - Looks like with nvme-rdma you max out your iops at 1.6 MIOPs, that seems considerably low against other reports. Can you try and explain what was the bottleneck? This can be a potential bug and I (and the rest of the community is interesting in knowing more details). - srp/scst comparison is really not fair having it in legacy request mode. Can you please repeat it and report a bug to either linux-rdma or to the scst mailing list? - Your latency measurements are surprisingly high for a null target device (even for low end nvme device actually) regardless of the transport implementation. For example: - QD=1 read latency is 648.95 for ibnbd (I assume usecs right?) which is fairly high. on nvme-rdma its 1058 us, which means over 1 millisecond and even 1.254 ms for srp. Last time I tested nvme-rdma read QD=1 latency I got ~14 us. So something does not add up here. If this is not some configuration issue, then we have serious bugs to handle.. - QD=16 the read latencies are > 10ms for null devices?! I'm having troubles understanding how you were able to get such high latencies (> 100 ms for QD>=100) Can you share more information about your setup? It would really help us understand more.
- Major parts of the code were rewritten, simplified and overall code size was reduced by a quarter.
That is good to know.