On 7/19/19 11:54 PM, oulijun wrote:
Hi, Bart Van Assche & Doug Ledford I am targeting a problem about RoCE and SCSI over RDMA from srpt in kernel-4.14. When insmod srpt.ko and insmod hns-roce-hw-v2.ko, it will report a warning in srpt_add_one: ib_srpt srpt_add_one(hns_0) failed. I am tracking the error from ib_cm_listen in srpt_add_one.I found it returned an error when doing server_id validation the error code as follows: static int __ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask) { struct cm_id_private *cm_id_priv, *cur_cm_id_priv; int ret = 0; service_mask = service_mask ? service_mask : ~cpu_to_be64(0); service_id &= service_mask; if ((service_id & IB_SERVICE_ID_AGN_MASK) == IB_CM_ASSIGN_SERVICE_ID && (service_id != IB_CM_ASSIGN_SERVICE_ID)) return -EINVAL; ...... } static void srpt_add_one(struct ib_device *device) { struct srpt_device *sdev; struct srpt_port *sport; int i; pr_debug("device = %p\n", device); sdev = kzalloc(sizeof(*sdev), GFP_KERNEL); if (!sdev) goto err; sdev->device = device; mutex_init(&sdev->sdev_mutex); sdev->pd = ib_alloc_pd(device, 0); if (IS_ERR(sdev->pd)) goto free_dev; sdev->lkey = sdev->pd->local_dma_lkey; sdev->srq_size = min(srpt_srq_size, sdev->device->attrs.max_srq_wr); srpt_use_srq(sdev, sdev->port[0].port_attrib.use_srq); if (!srpt_service_guid) srpt_service_guid = be64_to_cpu(device->node_guid); sdev->cm_id = ib_create_cm_id(device, srpt_cm_handler, sdev); if (IS_ERR(sdev->cm_id)) goto err_ring; /* print out target login information */ pr_debug("Target login info: id_ext=%016llx,ioc_guid=%016llx," "pkey=ffff,service_id=%016llx\n", srpt_service_guid, srpt_service_guid, srpt_service_guid); /* * We do not have a consistent service_id (ie. also id_ext of target_id) * to identify this target. We currently use the guid of the first HCA * in the system as service_id; therefore, the target_id will change * if this HCA is gone bad and replaced by different HCA */ if (ib_cm_listen(sdev->cm_id, cpu_to_be64(srpt_service_guid), 0)) goto err_cm; ...... } However, I check the srpt_service_guid is obtained by device->node_guid. I think that the compute algorithm is ok for device->node_guid. In addition, I analyzed a patch in kernel-4.17(IB/srpt: Add RDMA/CM support). As a result, I can understand that the previous srpt is not supported by RDMA/CM? So, all RoCE will failed when use kernel-4.14 version to run srpt.ko?
Before commit 63cf1a902c9d ("IB/srpt: Add RDMA/CM support"; v4.17) ib_cm_listen() was called with srpt_service_guid as argument for all RDMA adapters. Since that commit ib_cm_listen() is only called for ports that have IB as link layer. In other words, I think the failure that you reported only occurs with kernel before 4.17 and not with kernel v4.17 or any later kernel.
Bart.