Re: [PATCH 08/17] nvme-rdma: add metadata/T10-PI support

Christoph Hellwig <hch@xxxxxx> · Tue, 21 Apr 2020 14:20:30 +0200

On Fri, Mar 27, 2020 at 08:15:36PM +0300, Max Gurtovoy wrote:
> For capable HCAs (e.g. ConnectX-5/ConnectX-6) this will allow end-to-end
> protection information passthrough and validation for NVMe over RDMA
> transport. Metadata offload support was implemented over the new RDMA
> signature verbs API and it is enabled per controller by using nvme-cli.
> 
> usage example:
> nvme connect --pi_enable --transport=rdma --traddr=10.0.1.1 --nqn=test-nvme
> 
> Signed-off-by: Max Gurtovoy <maxg@xxxxxxxxxxxx>
> Signed-off-by: Israel Rukshin <israelr@xxxxxxxxxxxx>
> ---
>  drivers/nvme/host/rdma.c | 330 ++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 296 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index e38f8f7..23cc77e 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -67,6 +67,9 @@ struct nvme_rdma_request {
>  	struct ib_cqe		reg_cqe;
>  	struct nvme_rdma_queue  *queue;
>  	struct nvme_rdma_sgl	data_sgl;
> +	/* Metadata (T10-PI) support */
> +	struct nvme_rdma_sgl	*md_sgl;
> +	bool			use_md;

Do we need a use_md flag vs just using md_sgl as a boolean and/or
using blk_integrity_rq?

>  enum nvme_rdma_queue_flags {
> @@ -88,6 +91,7 @@ struct nvme_rdma_queue {
>  	struct rdma_cm_id	*cm_id;
>  	int			cm_error;
>  	struct completion	cm_done;
> +	bool			pi_support;

Why do we need this on a per-queue basis vs always checking the
controller?

> +	u32 max_page_list_len =
> +		pi_support ? ibdev->attrs.max_pi_fast_reg_page_list_len :
> +			     ibdev->attrs.max_fast_reg_page_list_len;
> +
> +	return min_t(u32, NVME_RDMA_MAX_SEGMENTS, max_page_list_len - 1);

Can you use a good old if / else here?

> +#ifdef CONFIG_BLK_DEV_INTEGRITY
> +static void nvme_rdma_set_sig_domain(struct blk_integrity *bi,
> +		struct nvme_command *cmd, struct ib_sig_domain *domain,
> +		u16 control)
>  {
> +	domain->sig_type = IB_SIG_TYPE_T10_DIF;
> +	domain->sig.dif.bg_type = IB_T10DIF_CRC;
> +	domain->sig.dif.pi_interval = 1 << bi->interval_exp;
> +	domain->sig.dif.ref_tag = le32_to_cpu(cmd->rw.reftag);
>  
>  	/*
> +	 * At the moment we hard code those, but in the future
> +	 * we will take them from cmd.

I don't understand this comment.  Also it doesn't use up all 80 chars.

> +static void nvme_rdma_set_sig_attrs(struct blk_integrity *bi,
> +		struct nvme_command *cmd, struct ib_sig_attrs *sig_attrs)
> +{
> +	u16 control = le16_to_cpu(cmd->rw.control);
> +
> +	WARN_ON(bi == NULL);

I think this WARN_ON is pointless, as we'll get a NULL pointer derference
a little later anyway.

> +mr_put:
> +	if (req->use_md)
> +		ib_mr_pool_put(queue->qp, &queue->qp->sig_mrs, req->mr);
> +	else
> +		ib_mr_pool_put(queue->qp, &queue->qp->rdma_mrs, req->mr);

I've seen this patterns a few times, maybe a little helper to return
the right mr pool for a request?

> +	if (blk_integrity_rq(rq)) {
> +		memset(req->md_sgl, 0, sizeof(struct nvme_rdma_sgl));

Why do we need this memset?