Hi Leon, thanks for the feedback! On Tue, Jul 9, 2019 at 1:00 PM Leon Romanovsky <leon@xxxxxxxxxx> wrote: > > On Tue, Jul 09, 2019 at 11:55:03AM +0200, Danil Kipnis wrote: > > Hallo Doug, Hallo Jason, Hallo Jens, Hallo Greg, > > > > Could you please provide some feedback to the IBNBD driver and the > > IBTRS library? > > So far we addressed all the requests provided by the community and > > continue to maintain our code up-to-date with the upstream kernel > > while having an extra compatibility layer for older kernels in our > > out-of-tree repository. > > I understand that SRP and NVMEoF which are in the kernel already do > > provide equivalent functionality for the majority of the use cases. > > IBNBD on the other hand is showing higher performance and more > > importantly includes the IBTRS - a general purpose library to > > establish connections and transport BIO-like read/write sg-lists over > > RDMA, while SRP is targeting SCSI and NVMEoF is addressing NVME. While > > I believe IBNBD does meet the kernel coding standards, it doesn't have > > a lot of users, while SRP and NVMEoF are widely accepted. Do you think > > it would make sense for us to rework our patchset and try pushing it > > for staging tree first, so that we can proof IBNBD is well maintained, > > beneficial for the eco-system, find a proper location for it within > > block/rdma subsystems? This would make it easier for people to try it > > out and would also be a huge step for us in terms of maintenance > > effort. > > The names IBNBD and IBTRS are in fact misleading. IBTRS sits on top of > > RDMA and is not bound to IB (We will evaluate IBTRS with ROCE in the > > near future). Do you think it would make sense to rename the driver to > > RNBD/RTRS? > > It is better to avoid "staging" tree, because it will lack attention of > relevant people and your efforts will be lost once you will try to move > out of staging. We are all remembering Lustre and don't want to see it > again. > > Back then, you was asked to provide support for performance superiority. I have only theories of why ibnbd is showing better numbers than nvmeof: 1. The way we utilize the MQ framework in IBNBD. We promise to have queue_depth (say 512) requests on each of the num_cpus hardware queues of each device, but in fact we have only queue_depth for the whole "session" toward a given server. The moment we have queue_depth inflights we need stop the queue (on a device on a cpu) we get more requests on. We need to start them again after some requests are completed. We maintain per cpu lists of stopped HW queues, a bitmap showing which lists are not empty, etc. to wake them up in a round-robin fashion to avoid starvation of any devices. 2. We only do rdma writes with imm. A server reserves queue_depth of max_io_size buffers for a given client. The client manages those himself. Client uses imm field to tell to the server which buffer has been written (and where) and server uses the imm field to send back errno. If our max_io_size is 64K and queue_depth 512 and client only issues 4K IOs all the time, then 60*512K memory is wasted. On the other hand we do no buffer allocation/registration in io path on server side. Server sends rdma addresses and keys to those preregistered buffers on connection establishment and deallocates/unregisters them when a session is closed. That's for writes. For reads, client registers user buffers (after fr) and sends the addresses and keys to the server (with an rdma write with imm). Server rdma writes into those buffers. Client does the unregistering/invalidation and completes the request. > Can you please share any numbers with us? Apart from github (https://github.com/ionos-enterprise/ibnbd/tree/master/performance/v4-v5.2-rc3) the performance results for v5.2-rc3 on two different systems can be accessed under dcd.ionos.com/ibnbd-performance-report. The page allows to filter out test scenarios interesting for comparison. > > Thanks