Re: [PATCH rdma-next] RDMA: Explicitly pass in the dma_device to ib_register_device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



-----"Leon Romanovsky" <leon@xxxxxxxxxx> wrote: -----

>To: "Doug Ledford" <dledford@xxxxxxxxxx>, "Jason Gunthorpe"
><jgg@xxxxxxxxxx>
>From: "Leon Romanovsky" <leon@xxxxxxxxxx>
>Date: 09/22/2020 10:28AM
>Cc: "Adit Ranadive" <aditr@xxxxxxxxxx>, "Ariel Elior"
><aelior@xxxxxxxxxxx>, "Bernard Metzler" <bmt@xxxxxxxxxxxxxx>,
>"Christian Benvenuti" <benve@xxxxxxxxx>, "Dennis Dalessandro"
><dennis.dalessandro@xxxxxxxxx>, "Devesh Sharma"
><devesh.sharma@xxxxxxxxxxxx>, "Faisal Latif"
><faisal.latif@xxxxxxxxx>, "Gal Pressman" <galpress@xxxxxxxxxx>,
>"Lijun Ou" <oulijun@xxxxxxxxxx>, linux-rdma@xxxxxxxxxxxxxxx, "Michal
>Kalderon" <mkalderon@xxxxxxxxxxx>, "Mike Marciniszyn"
><mike.marciniszyn@xxxxxxxxx>, "Naresh Kumar PBS"
><nareshkumar.pbs@xxxxxxxxxxxx>, "Nelson Escobar"
><neescoba@xxxxxxxxx>, "Parav Pandit" <parav@xxxxxxxxxx>, "Parvi
>Kaustubhi" <pkaustub@xxxxxxxxx>, "Potnuri Bharat Teja"
><bharat@xxxxxxxxxxx>, "Selvin Xavier" <selvin.xavier@xxxxxxxxxxxx>,
>"Shiraz Saleem" <shiraz.saleem@xxxxxxxxx>, "Somnath Kotur"
><somnath.kotur@xxxxxxxxxxxx>, "Sriharsha Basavapatna"
><sriharsha.basavapatna@xxxxxxxxxxxx>, "VMware PV-Drivers"
><pv-drivers@xxxxxxxxxx>, "Weihang Li" <liweihang@xxxxxxxxxx>, "Wei
>Hu(Xavier)" <huwei87@xxxxxxxxxxxxx>, "Yishai Hadas"
><yishaih@xxxxxxxxxx>, "Zhu Yanjun" <yanjunz@xxxxxxxxxx>
>Subject: [EXTERNAL] [PATCH rdma-next] RDMA: Explicitly pass in the
>dma_device to ib_register_device
>
>From: Jason Gunthorpe <jgg@xxxxxxxxxx>
>
>The current design is convoulted where the IB core makes assumptions
>that a real DMA device will usually come from parent unless it looks
>like the ib_device is partially setup for DMA, in which case the
>ib_device itself is used for DMA, but somethings might still come
>from the parent.
>
>Make this clearer by having the caller explicitly specify what the
>DMA device should be. The caller is always responsible to fully
>setup the DMA device it specifies. If NULL is used then the
>ib_device will be used as the DMA device, but the caller must
>still set it up completely.
>
>rvt is the only driver that did not fully setup the DMA device
>before registering. Move the rvt specific code out of
>setup_dma_device() into rvt and set the dma_mask's directly.
>
>Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx>
>Reviewed-by: Parav Pandit <parav@xxxxxxxxxx>
>Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxx>
>---
> drivers/infiniband/core/device.c              | 73
>+++++++------------
> drivers/infiniband/hw/bnxt_re/main.c          |  2 +-
> drivers/infiniband/hw/cxgb4/provider.c        |  3 +-
> drivers/infiniband/hw/efa/efa_main.c          |  2 +-
> drivers/infiniband/hw/hns/hns_roce_main.c     |  2 +-
> drivers/infiniband/hw/i40iw/i40iw_verbs.c     |  2 +-
> drivers/infiniband/hw/mlx4/main.c             |  3 +-
> drivers/infiniband/hw/mlx5/main.c             |  2 +-
> drivers/infiniband/hw/mthca/mthca_provider.c  |  2 +-
> drivers/infiniband/hw/ocrdma/ocrdma_main.c    |  3 +-
> drivers/infiniband/hw/qedr/main.c             |  2 +-
> drivers/infiniband/hw/usnic/usnic_ib_main.c   |  2 +-
> .../infiniband/hw/vmw_pvrdma/pvrdma_main.c    |  2 +-
> drivers/infiniband/sw/rdmavt/vt.c             |  8 +-
> drivers/infiniband/sw/rxe/rxe_verbs.c         |  2 +-
> drivers/infiniband/sw/siw/siw_main.c          |  4 +-
> include/rdma/ib_verbs.h                       |  3 +-
> 17 files changed, 52 insertions(+), 65 deletions(-)
>
>diff --git a/drivers/infiniband/core/device.c
>b/drivers/infiniband/core/device.c
>index ec3becf85cac..417d93bbdaca 100644
>--- a/drivers/infiniband/core/device.c
>+++ b/drivers/infiniband/core/device.c
>@@ -1177,58 +1177,34 @@ static int assign_name(struct ib_device
>*device, const char *name)
> 	return ret;
> }
> 
>-static void setup_dma_device(struct ib_device *device)
>+static void setup_dma_device(struct ib_device *device,
>+			     struct device *dma_device)
> {
>-	struct device *parent = device->dev.parent;
>-
>-	WARN_ON_ONCE(device->dma_device);
>-
>-#ifdef CONFIG_DMA_OPS
>-	if (device->dev.dma_ops) {
>+	if (!dma_device) {
> 		/*
>-		 * The caller provided custom DMA operations. Copy the
>-		 * DMA-related fields that are used by e.g. dma_alloc_coherent()
>-		 * into device->dev.
>+		 * If the caller does not provide a DMA capable device then the
>+		 * IB device will be used. In this case the caller should fully
>+		 * setup the ibdev for DMA. This usually means using
>+		 * dma_virt_ops.
> 		 */
>-		device->dma_device = &device->dev;
>-		if (!device->dev.dma_mask) {
>-			if (parent)
>-				device->dev.dma_mask = parent->dma_mask;
>-			else
>-				WARN_ON_ONCE(true);
>-		}
>-		if (!device->dev.coherent_dma_mask) {
>-			if (parent)
>-				device->dev.coherent_dma_mask =
>-					parent->coherent_dma_mask;
>-			else
>-				WARN_ON_ONCE(true);
>-		}
>-	} else
>-#endif /* CONFIG_DMA_OPS */
>-	{
>+#ifdef CONFIG_DMA_OPS
>+		if (WARN_ON(!device->dev.dma_ops))
>+			return;
>+#endif
>+		if (WARN_ON(!device->dev.dma_parms))
>+			return;
>+
>+		dma_device = &device->dev;
>+	} else {
>+		device->dev.dma_parms = dma_device->dma_parms;
> 		/*
>-		 * The caller did not provide custom DMA operations. Use the
>-		 * DMA mapping operations of the parent device.
>+		 * Auto setup the segment size if a DMA device was passed in.
>+		 * The PCI core sets the maximum segment size to 64 KB. Increase
>+		 * this parameter to 2 GB.
> 		 */
>-		WARN_ON_ONCE(!parent);
>-		device->dma_device = parent;
>-	}
>-
>-	if (!device->dev.dma_parms) {
>-		if (parent) {
>-			/*
>-			 * The caller did not provide DMA parameters, so
>-			 * 'parent' probably represents a PCI device. The PCI
>-			 * core sets the maximum segment size to 64
>-			 * KB. Increase this parameter to 2 GB.
>-			 */
>-			device->dev.dma_parms = parent->dma_parms;
>-			dma_set_max_seg_size(device->dma_device, SZ_2G);
>-		} else {
>-			WARN_ON_ONCE(true);
>-		}
>+		dma_set_max_seg_size(dma_device, SZ_2G);
> 	}
>+	device->dma_device = dma_device;
> }
> 
> /*
>@@ -1241,7 +1217,6 @@ static int setup_device(struct ib_device
>*device)
> 	struct ib_udata uhw = {.outlen = 0, .inlen = 0};
> 	int ret;
> 
>-	setup_dma_device(device);
> 	ib_device_check_mandatory(device);
> 
> 	ret = setup_port_data(device);
>@@ -1361,7 +1336,8 @@ static void prevent_dealloc_device(struct
>ib_device *ib_dev)
>  * asynchronously then the device pointer may become freed as soon
>as this
>  * function returns.
>  */
>-int ib_register_device(struct ib_device *device, const char *name)
>+int ib_register_device(struct ib_device *device, const char *name,
>+		       struct device *dma_device)
> {
> 	int ret;
> 
>@@ -1369,6 +1345,7 @@ int ib_register_device(struct ib_device
>*device, const char *name)
> 	if (ret)
> 		return ret;
> 
>+	setup_dma_device(device, dma_device);
> 	ret = setup_device(device);
> 	if (ret)
> 		return ret;
>diff --git a/drivers/infiniband/hw/bnxt_re/main.c
>b/drivers/infiniband/hw/bnxt_re/main.c
>index 53aee5a42ab8..b3bc62021039 100644
>--- a/drivers/infiniband/hw/bnxt_re/main.c
>+++ b/drivers/infiniband/hw/bnxt_re/main.c
>@@ -736,7 +736,7 @@ static int bnxt_re_register_ib(struct bnxt_re_dev
>*rdev)
> 	if (ret)
> 		return ret;
> 
>-	return ib_register_device(ibdev, "bnxt_re%d");
>+	return ib_register_device(ibdev, "bnxt_re%d",
>&rdev->en_dev->pdev->dev);
> }
> 
> static void bnxt_re_dev_remove(struct bnxt_re_dev *rdev)
>diff --git a/drivers/infiniband/hw/cxgb4/provider.c
>b/drivers/infiniband/hw/cxgb4/provider.c
>index 4b76f2f3f4e4..5f4f3abf41e4 100644
>--- a/drivers/infiniband/hw/cxgb4/provider.c
>+++ b/drivers/infiniband/hw/cxgb4/provider.c
>@@ -570,7 +570,8 @@ void c4iw_register_device(struct work_struct
>*work)
> 	ret = set_netdevs(&dev->ibdev, &dev->rdev);
> 	if (ret)
> 		goto err_dealloc_ctx;
>-	ret = ib_register_device(&dev->ibdev, "cxgb4_%d");
>+	ret = ib_register_device(&dev->ibdev, "cxgb4_%d",
>+				 &dev->rdev.lldi.pdev->dev);
> 	if (ret)
> 		goto err_dealloc_ctx;
> 	return;
>diff --git a/drivers/infiniband/hw/efa/efa_main.c
>b/drivers/infiniband/hw/efa/efa_main.c
>index 92d701146320..4de5be3e1dfe 100644
>--- a/drivers/infiniband/hw/efa/efa_main.c
>+++ b/drivers/infiniband/hw/efa/efa_main.c
>@@ -331,7 +331,7 @@ static int efa_ib_device_add(struct efa_dev *dev)
> 
> 	ib_set_device_ops(&dev->ibdev, &efa_dev_ops);
> 
>-	err = ib_register_device(&dev->ibdev, "efa_%d");
>+	err = ib_register_device(&dev->ibdev, "efa_%d", &pdev->dev);
> 	if (err)
> 		goto err_release_doorbell_bar;
> 
>diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c
>b/drivers/infiniband/hw/hns/hns_roce_main.c
>index 2b4d75733e72..1b5f895d7daf 100644
>--- a/drivers/infiniband/hw/hns/hns_roce_main.c
>+++ b/drivers/infiniband/hw/hns/hns_roce_main.c
>@@ -547,7 +547,7 @@ static int hns_roce_register_device(struct
>hns_roce_dev *hr_dev)
> 		if (ret)
> 			return ret;
> 	}
>-	ret = ib_register_device(ib_dev, "hns_%d");
>+	ret = ib_register_device(ib_dev, "hns_%d", dev);
> 	if (ret) {
> 		dev_err(dev, "ib_register_device failed!\n");
> 		return ret;
>diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c
>b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
>index e53f6c0dc12e..945d30a86bbc 100644
>--- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c
>+++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
>@@ -2748,7 +2748,7 @@ int i40iw_register_rdma_device(struct
>i40iw_device *iwdev)
> 	if (ret)
> 		goto error;
> 
>-	ret = ib_register_device(&iwibdev->ibdev, "i40iw%d");
>+	ret = ib_register_device(&iwibdev->ibdev, "i40iw%d",
>&iwdev->hw.pcidev->dev);
> 	if (ret)
> 		goto error;
> 
>diff --git a/drivers/infiniband/hw/mlx4/main.c
>b/drivers/infiniband/hw/mlx4/main.c
>index 753c70402498..cd0fba6b0964 100644
>--- a/drivers/infiniband/hw/mlx4/main.c
>+++ b/drivers/infiniband/hw/mlx4/main.c
>@@ -2841,7 +2841,8 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
> 		goto err_steer_free_bitmap;
> 
> 	rdma_set_device_sysfs_group(&ibdev->ib_dev, &mlx4_attr_group);
>-	if (ib_register_device(&ibdev->ib_dev, "mlx4_%d"))
>+	if (ib_register_device(&ibdev->ib_dev, "mlx4_%d",
>+			       &dev->persist->pdev->dev))
> 		goto err_diag_counters;
> 
> 	if (mlx4_ib_mad_init(ibdev))
>diff --git a/drivers/infiniband/hw/mlx5/main.c
>b/drivers/infiniband/hw/mlx5/main.c
>index 3ae681a6ae3b..bca57c7661eb 100644
>--- a/drivers/infiniband/hw/mlx5/main.c
>+++ b/drivers/infiniband/hw/mlx5/main.c
>@@ -4404,7 +4404,7 @@ static int mlx5_ib_stage_ib_reg_init(struct
>mlx5_ib_dev *dev)
> 		name = "mlx5_%d";
> 	else
> 		name = "mlx5_bond_%d";
>-	return ib_register_device(&dev->ib_dev, name);
>+	return ib_register_device(&dev->ib_dev, name,
>&dev->mdev->pdev->dev);
> }
> 
> static void mlx5_ib_stage_pre_ib_reg_umr_cleanup(struct mlx5_ib_dev
>*dev)
>diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c
>b/drivers/infiniband/hw/mthca/mthca_provider.c
>index 31b558ff8218..c4d9cdc4ee97 100644
>--- a/drivers/infiniband/hw/mthca/mthca_provider.c
>+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
>@@ -1206,7 +1206,7 @@ int mthca_register_device(struct mthca_dev
>*dev)
> 	mutex_init(&dev->cap_mask_mutex);
> 
> 	rdma_set_device_sysfs_group(&dev->ib_dev, &mthca_attr_group);
>-	ret = ib_register_device(&dev->ib_dev, "mthca%d");
>+	ret = ib_register_device(&dev->ib_dev, "mthca%d", &dev->pdev->dev);
> 	if (ret)
> 		return ret;
> 
>diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c
>b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
>index d8c47d24d6d6..60416186f1d0 100644
>--- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c
>+++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
>@@ -255,7 +255,8 @@ static int ocrdma_register_device(struct
>ocrdma_dev *dev)
> 	if (ret)
> 		return ret;
> 
>-	return ib_register_device(&dev->ibdev, "ocrdma%d");
>+	return ib_register_device(&dev->ibdev, "ocrdma%d",
>+				  &dev->nic_info.pdev->dev);
> }
> 
> static int ocrdma_alloc_resources(struct ocrdma_dev *dev)
>diff --git a/drivers/infiniband/hw/qedr/main.c
>b/drivers/infiniband/hw/qedr/main.c
>index 7c0aac3e635b..464becdd41f7 100644
>--- a/drivers/infiniband/hw/qedr/main.c
>+++ b/drivers/infiniband/hw/qedr/main.c
>@@ -293,7 +293,7 @@ static int qedr_register_device(struct qedr_dev
>*dev)
> 	if (rc)
> 		return rc;
> 
>-	return ib_register_device(&dev->ibdev, "qedr%d");
>+	return ib_register_device(&dev->ibdev, "qedr%d", &dev->pdev->dev);
> }
> 
> /* This function allocates fast-path status block memory */
>diff --git a/drivers/infiniband/hw/usnic/usnic_ib_main.c
>b/drivers/infiniband/hw/usnic/usnic_ib_main.c
>index 462ed71abf53..6c23a5472168 100644
>--- a/drivers/infiniband/hw/usnic/usnic_ib_main.c
>+++ b/drivers/infiniband/hw/usnic/usnic_ib_main.c
>@@ -425,7 +425,7 @@ static void *usnic_ib_device_add(struct pci_dev
>*dev)
> 	if (ret)
> 		goto err_fwd_dealloc;
> 
>-	if (ib_register_device(&us_ibdev->ib_dev, "usnic_%d"))
>+	if (ib_register_device(&us_ibdev->ib_dev, "usnic_%d", &dev->dev))
> 		goto err_fwd_dealloc;
> 
> 	usnic_fwd_set_mtu(us_ibdev->ufdev, us_ibdev->netdev->mtu);
>diff --git a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c
>b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c
>index 780fd2dfc07e..5b2c94441125 100644
>--- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c
>+++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c
>@@ -270,7 +270,7 @@ static int pvrdma_register_device(struct
>pvrdma_dev *dev)
> 	spin_lock_init(&dev->srq_tbl_lock);
> 	rdma_set_device_sysfs_group(&dev->ib_dev, &pvrdma_attr_group);
> 
>-	ret = ib_register_device(&dev->ib_dev, "vmw_pvrdma%d");
>+	ret = ib_register_device(&dev->ib_dev, "vmw_pvrdma%d",
>&dev->pdev->dev);
> 	if (ret)
> 		goto err_srq_free;
> 
>diff --git a/drivers/infiniband/sw/rdmavt/vt.c
>b/drivers/infiniband/sw/rdmavt/vt.c
>index f904bb34477a..2f117ac11c8b 100644
>--- a/drivers/infiniband/sw/rdmavt/vt.c
>+++ b/drivers/infiniband/sw/rdmavt/vt.c
>@@ -581,7 +581,11 @@ int rvt_register_device(struct rvt_dev_info
>*rdi)
> 	spin_lock_init(&rdi->n_cqs_lock);
> 
> 	/* DMA Operations */
>-	rdi->ibdev.dev.dma_ops = rdi->ibdev.dev.dma_ops ? : &dma_virt_ops;
>+	rdi->ibdev.dev.dma_ops = &dma_virt_ops;
>+	rdi->ibdev.dev.dma_parms = rdi->ibdev.dev.parent->dma_parms;
>+	rdi->ibdev.dev.dma_mask = rdi->ibdev.dev.parent->dma_mask;
>+	rdi->ibdev.dev.coherent_dma_mask =
>+		rdi->ibdev.dev.parent->coherent_dma_mask;
> 
> 	/* Protection Domain */
> 	spin_lock_init(&rdi->n_pds_lock);
>@@ -629,7 +633,7 @@ int rvt_register_device(struct rvt_dev_info *rdi)
> 		rdi->ibdev.num_comp_vectors = 1;
> 
> 	/* We are now good to announce we exist */
>-	ret = ib_register_device(&rdi->ibdev, dev_name(&rdi->ibdev.dev));
>+	ret = ib_register_device(&rdi->ibdev, dev_name(&rdi->ibdev.dev),
>NULL);
> 	if (ret) {
> 		rvt_pr_err(rdi, "Failed to register driver with ib core.\n");
> 		goto bail_wss;
>diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c
>b/drivers/infiniband/sw/rxe/rxe_verbs.c
>index f368dc16281a..37fee72755be 100644
>--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
>+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
>@@ -1182,7 +1182,7 @@ int rxe_register_device(struct rxe_dev *rxe,
>const char *ibdev_name)
> 	rxe->tfm = tfm;
> 
> 	rdma_set_device_sysfs_group(dev, &rxe_attr_group);
>-	err = ib_register_device(dev, ibdev_name);
>+	err = ib_register_device(dev, ibdev_name, NULL);
> 	if (err)
> 		pr_warn("%s failed with error %d\n", __func__, err);
> 
>diff --git a/drivers/infiniband/sw/siw/siw_main.c
>b/drivers/infiniband/sw/siw/siw_main.c
>index d862bec84376..0362d57b4db8 100644
>--- a/drivers/infiniband/sw/siw/siw_main.c
>+++ b/drivers/infiniband/sw/siw/siw_main.c
>@@ -69,7 +69,7 @@ static int siw_device_register(struct siw_device
>*sdev, const char *name)
> 
> 	sdev->vendor_part_id = dev_id++;
> 
>-	rv = ib_register_device(base_dev, name);
>+	rv = ib_register_device(base_dev, name, NULL);
> 	if (rv) {
> 		pr_warn("siw: device registration error %d\n", rv);
> 		return rv;
>@@ -386,6 +386,8 @@ static struct siw_device
>*siw_device_create(struct net_device *netdev)
> 	base_dev->dev.dma_parms = &sdev->dma_parms;
> 	sdev->dma_parms = (struct device_dma_parameters)
> 		{ .max_segment_size = SZ_2G };
>+	dma_coerce_mask_and_coherent(&base_dev->dev,
>+				     dma_get_required_mask(&base_dev->dev));

Leon, can you please help me to understand this
additional logic? Do we need to setup the DMA device
for (software) RDMA devices which rely on dma_virt_ops
in the end, or better leave it untouched?

Thanks very much!
Bernard.





[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux