On Fri, Mar 27, 2020 at 08:15:36PM +0300, Max Gurtovoy wrote: > For capable HCAs (e.g. ConnectX-5/ConnectX-6) this will allow end-to-end > protection information passthrough and validation for NVMe over RDMA > transport. Metadata offload support was implemented over the new RDMA > signature verbs API and it is enabled per controller by using nvme-cli. > > usage example: > nvme connect --pi_enable --transport=rdma --traddr=10.0.1.1 --nqn=test-nvme > > Signed-off-by: Max Gurtovoy <maxg@xxxxxxxxxxxx> > Signed-off-by: Israel Rukshin <israelr@xxxxxxxxxxxx> > --- > drivers/nvme/host/rdma.c | 330 ++++++++++++++++++++++++++++++++++++++++++----- > 1 file changed, 296 insertions(+), 34 deletions(-) > > diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c > index e38f8f7..23cc77e 100644 > --- a/drivers/nvme/host/rdma.c > +++ b/drivers/nvme/host/rdma.c > @@ -67,6 +67,9 @@ struct nvme_rdma_request { > struct ib_cqe reg_cqe; > struct nvme_rdma_queue *queue; > struct nvme_rdma_sgl data_sgl; > + /* Metadata (T10-PI) support */ > + struct nvme_rdma_sgl *md_sgl; > + bool use_md; Do we need a use_md flag vs just using md_sgl as a boolean and/or using blk_integrity_rq? > enum nvme_rdma_queue_flags { > @@ -88,6 +91,7 @@ struct nvme_rdma_queue { > struct rdma_cm_id *cm_id; > int cm_error; > struct completion cm_done; > + bool pi_support; Why do we need this on a per-queue basis vs always checking the controller? > + u32 max_page_list_len = > + pi_support ? ibdev->attrs.max_pi_fast_reg_page_list_len : > + ibdev->attrs.max_fast_reg_page_list_len; > + > + return min_t(u32, NVME_RDMA_MAX_SEGMENTS, max_page_list_len - 1); Can you use a good old if / else here? > +#ifdef CONFIG_BLK_DEV_INTEGRITY > +static void nvme_rdma_set_sig_domain(struct blk_integrity *bi, > + struct nvme_command *cmd, struct ib_sig_domain *domain, > + u16 control) > { > + domain->sig_type = IB_SIG_TYPE_T10_DIF; > + domain->sig.dif.bg_type = IB_T10DIF_CRC; > + domain->sig.dif.pi_interval = 1 << bi->interval_exp; > + domain->sig.dif.ref_tag = le32_to_cpu(cmd->rw.reftag); > > /* > + * At the moment we hard code those, but in the future > + * we will take them from cmd. I don't understand this comment. Also it doesn't use up all 80 chars. > +static void nvme_rdma_set_sig_attrs(struct blk_integrity *bi, > + struct nvme_command *cmd, struct ib_sig_attrs *sig_attrs) > +{ > + u16 control = le16_to_cpu(cmd->rw.control); > + > + WARN_ON(bi == NULL); I think this WARN_ON is pointless, as we'll get a NULL pointer derference a little later anyway. > +mr_put: > + if (req->use_md) > + ib_mr_pool_put(queue->qp, &queue->qp->sig_mrs, req->mr); > + else > + ib_mr_pool_put(queue->qp, &queue->qp->rdma_mrs, req->mr); I've seen this patterns a few times, maybe a little helper to return the right mr pool for a request? > + if (blk_integrity_rq(rq)) { > + memset(req->md_sgl, 0, sizeof(struct nvme_rdma_sgl)); Why do we need this memset?