> On Oct 29, 2017, at 12:38 PM, idanb@xxxxxxxxxxxx wrote: > > From: Idan Burstein <idanb@xxxxxxxxxxxx> > > NVMe over Fabrics in its secure "register_always" mode > registers and invalidates the user buffer upon each IO. > The protocol enables the host to request the susbsystem > to use SEND WITH INVALIDATE operation while returning the > response capsule and invalidate the local key > (remote_invalidation). > In some HW implementations, the local network adapter may > perform better while using local invalidation operations. > > The results below show that running with local invalidation > rather then remote invalidation improve the iops one could > achieve by using the ConnectX-5Ex network adapter by x1.36 factor. > Nevertheless, using local invalidation induce more CPU overhead > than enabling the target to invalidate remotly, therefore, > because there is a CPU% vs IOPs limit tradeoff we propose to > have module parameter to define whether to request remote > invalidation or not. > > The following results were taken against a single nvme over fabrics > subsystem with a single namespace backed by null_blk: > > Block Size s/g reg_wr inline reg_wr inline reg_wr + local inv > ++++++++++++ ++++++++++++++ ++++++++++++++++ +++++++++++++++++++++++++++ > 512B 1446.6K/8.57% 5224.2K/76.21% 7143.3K/79.72% > 1KB 1390.6K/8.5% 4656.7K/71.69% 5860.6K/55.45% > 2KB 1343.8K/8.6% 3410.3K/38.96% 4106.7K/55.82% > 4KB 1254.8K/8.39% 2033.6K/15.86% 2165.3K/17.48% > 8KB 1079.5K/7.7% 1143.1K/7.27% 1158.2K/7.33% > 16KB 603.4K/3.64% 593.8K/3.4% 588.9K/3.77% > 32KB 294.8K/2.04% 293.7K/1.98% 294.4K/2.93% > 64KB 138.2K/1.32% 141.6K/1.26% 135.6K/1.34% Units reported here are KIOPS and %CPU ? What was the benchmark? Was any root cause analysis done to understand why the IOPS rate changes without RI? Reduction in avg RTT? Fewer long- running outliers? Lock contention in the ULP? I am curious enough to add a similar setting to NFS/RDMA, now that I have mlx5 devices. > Signed-off-by: Max Gurtovoy <maxg@xxxxxxxxxxxx> > Signed-off-by: Idan Burstein <idanb@xxxxxxxxxxxx> > --- > drivers/nvme/host/rdma.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c > index 92a03ff..7f8225d 100644 > --- a/drivers/nvme/host/rdma.c > +++ b/drivers/nvme/host/rdma.c > @@ -146,6 +146,11 @@ static inline struct nvme_rdma_ctrl *to_rdma_ctrl(struct nvme_ctrl *ctrl) > MODULE_PARM_DESC(register_always, > "Use memory registration even for contiguous memory regions"); > > +static bool remote_invalidation = true; > +module_param(remote_invalidation, bool, 0444); > +MODULE_PARM_DESC(remote_invalidation, > + "request remote invalidation from subsystem (default: true)"); The use of a module parameter would be awkward in systems that have a heterogenous collection of HCAs. > + > static int nvme_rdma_cm_handler(struct rdma_cm_id *cm_id, > struct rdma_cm_event *event); > static void nvme_rdma_recv_done(struct ib_cq *cq, struct ib_wc *wc); > @@ -1152,8 +1157,9 @@ static int nvme_rdma_map_sg_fr(struct nvme_rdma_queue *queue, > sg->addr = cpu_to_le64(req->mr->iova); > put_unaligned_le24(req->mr->length, sg->length); > put_unaligned_le32(req->mr->rkey, sg->key); > - sg->type = (NVME_KEY_SGL_FMT_DATA_DESC << 4) | > - NVME_SGL_FMT_INVALIDATE; > + sg->type = NVME_KEY_SGL_FMT_DATA_DESC << 4; > + if (remote_invalidation) > + sg->type |= NVME_SGL_FMT_INVALIDATE; > > return 0; > } > -- > 1.8.3.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html