-----"Leon Romanovsky" <leon@xxxxxxxxxx> wrote: ----- >To: "Doug Ledford" <dledford@xxxxxxxxxx>, "Jason Gunthorpe" ><jgg@xxxxxxxxxx> >From: "Leon Romanovsky" <leon@xxxxxxxxxx> >Date: 09/22/2020 10:28AM >Cc: "Adit Ranadive" <aditr@xxxxxxxxxx>, "Ariel Elior" ><aelior@xxxxxxxxxxx>, "Bernard Metzler" <bmt@xxxxxxxxxxxxxx>, >"Christian Benvenuti" <benve@xxxxxxxxx>, "Dennis Dalessandro" ><dennis.dalessandro@xxxxxxxxx>, "Devesh Sharma" ><devesh.sharma@xxxxxxxxxxxx>, "Faisal Latif" ><faisal.latif@xxxxxxxxx>, "Gal Pressman" <galpress@xxxxxxxxxx>, >"Lijun Ou" <oulijun@xxxxxxxxxx>, linux-rdma@xxxxxxxxxxxxxxx, "Michal >Kalderon" <mkalderon@xxxxxxxxxxx>, "Mike Marciniszyn" ><mike.marciniszyn@xxxxxxxxx>, "Naresh Kumar PBS" ><nareshkumar.pbs@xxxxxxxxxxxx>, "Nelson Escobar" ><neescoba@xxxxxxxxx>, "Parav Pandit" <parav@xxxxxxxxxx>, "Parvi >Kaustubhi" <pkaustub@xxxxxxxxx>, "Potnuri Bharat Teja" ><bharat@xxxxxxxxxxx>, "Selvin Xavier" <selvin.xavier@xxxxxxxxxxxx>, >"Shiraz Saleem" <shiraz.saleem@xxxxxxxxx>, "Somnath Kotur" ><somnath.kotur@xxxxxxxxxxxx>, "Sriharsha Basavapatna" ><sriharsha.basavapatna@xxxxxxxxxxxx>, "VMware PV-Drivers" ><pv-drivers@xxxxxxxxxx>, "Weihang Li" <liweihang@xxxxxxxxxx>, "Wei >Hu(Xavier)" <huwei87@xxxxxxxxxxxxx>, "Yishai Hadas" ><yishaih@xxxxxxxxxx>, "Zhu Yanjun" <yanjunz@xxxxxxxxxx> >Subject: [EXTERNAL] [PATCH rdma-next] RDMA: Explicitly pass in the >dma_device to ib_register_device > >From: Jason Gunthorpe <jgg@xxxxxxxxxx> > >The current design is convoulted where the IB core makes assumptions >that a real DMA device will usually come from parent unless it looks >like the ib_device is partially setup for DMA, in which case the >ib_device itself is used for DMA, but somethings might still come >from the parent. > >Make this clearer by having the caller explicitly specify what the >DMA device should be. The caller is always responsible to fully >setup the DMA device it specifies. If NULL is used then the >ib_device will be used as the DMA device, but the caller must >still set it up completely. > >rvt is the only driver that did not fully setup the DMA device >before registering. Move the rvt specific code out of >setup_dma_device() into rvt and set the dma_mask's directly. > >Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx> >Reviewed-by: Parav Pandit <parav@xxxxxxxxxx> >Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxx> >--- > drivers/infiniband/core/device.c | 73 >+++++++------------ > drivers/infiniband/hw/bnxt_re/main.c | 2 +- > drivers/infiniband/hw/cxgb4/provider.c | 3 +- > drivers/infiniband/hw/efa/efa_main.c | 2 +- > drivers/infiniband/hw/hns/hns_roce_main.c | 2 +- > drivers/infiniband/hw/i40iw/i40iw_verbs.c | 2 +- > drivers/infiniband/hw/mlx4/main.c | 3 +- > drivers/infiniband/hw/mlx5/main.c | 2 +- > drivers/infiniband/hw/mthca/mthca_provider.c | 2 +- > drivers/infiniband/hw/ocrdma/ocrdma_main.c | 3 +- > drivers/infiniband/hw/qedr/main.c | 2 +- > drivers/infiniband/hw/usnic/usnic_ib_main.c | 2 +- > .../infiniband/hw/vmw_pvrdma/pvrdma_main.c | 2 +- > drivers/infiniband/sw/rdmavt/vt.c | 8 +- > drivers/infiniband/sw/rxe/rxe_verbs.c | 2 +- > drivers/infiniband/sw/siw/siw_main.c | 4 +- > include/rdma/ib_verbs.h | 3 +- > 17 files changed, 52 insertions(+), 65 deletions(-) > >diff --git a/drivers/infiniband/core/device.c >b/drivers/infiniband/core/device.c >index ec3becf85cac..417d93bbdaca 100644 >--- a/drivers/infiniband/core/device.c >+++ b/drivers/infiniband/core/device.c >@@ -1177,58 +1177,34 @@ static int assign_name(struct ib_device >*device, const char *name) > return ret; > } > >-static void setup_dma_device(struct ib_device *device) >+static void setup_dma_device(struct ib_device *device, >+ struct device *dma_device) > { >- struct device *parent = device->dev.parent; >- >- WARN_ON_ONCE(device->dma_device); >- >-#ifdef CONFIG_DMA_OPS >- if (device->dev.dma_ops) { >+ if (!dma_device) { > /* >- * The caller provided custom DMA operations. Copy the >- * DMA-related fields that are used by e.g. dma_alloc_coherent() >- * into device->dev. >+ * If the caller does not provide a DMA capable device then the >+ * IB device will be used. In this case the caller should fully >+ * setup the ibdev for DMA. This usually means using >+ * dma_virt_ops. > */ >- device->dma_device = &device->dev; >- if (!device->dev.dma_mask) { >- if (parent) >- device->dev.dma_mask = parent->dma_mask; >- else >- WARN_ON_ONCE(true); >- } >- if (!device->dev.coherent_dma_mask) { >- if (parent) >- device->dev.coherent_dma_mask = >- parent->coherent_dma_mask; >- else >- WARN_ON_ONCE(true); >- } >- } else >-#endif /* CONFIG_DMA_OPS */ >- { >+#ifdef CONFIG_DMA_OPS >+ if (WARN_ON(!device->dev.dma_ops)) >+ return; >+#endif >+ if (WARN_ON(!device->dev.dma_parms)) >+ return; >+ >+ dma_device = &device->dev; >+ } else { >+ device->dev.dma_parms = dma_device->dma_parms; > /* >- * The caller did not provide custom DMA operations. Use the >- * DMA mapping operations of the parent device. >+ * Auto setup the segment size if a DMA device was passed in. >+ * The PCI core sets the maximum segment size to 64 KB. Increase >+ * this parameter to 2 GB. > */ >- WARN_ON_ONCE(!parent); >- device->dma_device = parent; >- } >- >- if (!device->dev.dma_parms) { >- if (parent) { >- /* >- * The caller did not provide DMA parameters, so >- * 'parent' probably represents a PCI device. The PCI >- * core sets the maximum segment size to 64 >- * KB. Increase this parameter to 2 GB. >- */ >- device->dev.dma_parms = parent->dma_parms; >- dma_set_max_seg_size(device->dma_device, SZ_2G); >- } else { >- WARN_ON_ONCE(true); >- } >+ dma_set_max_seg_size(dma_device, SZ_2G); > } >+ device->dma_device = dma_device; > } > > /* >@@ -1241,7 +1217,6 @@ static int setup_device(struct ib_device >*device) > struct ib_udata uhw = {.outlen = 0, .inlen = 0}; > int ret; > >- setup_dma_device(device); > ib_device_check_mandatory(device); > > ret = setup_port_data(device); >@@ -1361,7 +1336,8 @@ static void prevent_dealloc_device(struct >ib_device *ib_dev) > * asynchronously then the device pointer may become freed as soon >as this > * function returns. > */ >-int ib_register_device(struct ib_device *device, const char *name) >+int ib_register_device(struct ib_device *device, const char *name, >+ struct device *dma_device) > { > int ret; > >@@ -1369,6 +1345,7 @@ int ib_register_device(struct ib_device >*device, const char *name) > if (ret) > return ret; > >+ setup_dma_device(device, dma_device); > ret = setup_device(device); > if (ret) > return ret; >diff --git a/drivers/infiniband/hw/bnxt_re/main.c >b/drivers/infiniband/hw/bnxt_re/main.c >index 53aee5a42ab8..b3bc62021039 100644 >--- a/drivers/infiniband/hw/bnxt_re/main.c >+++ b/drivers/infiniband/hw/bnxt_re/main.c >@@ -736,7 +736,7 @@ static int bnxt_re_register_ib(struct bnxt_re_dev >*rdev) > if (ret) > return ret; > >- return ib_register_device(ibdev, "bnxt_re%d"); >+ return ib_register_device(ibdev, "bnxt_re%d", >&rdev->en_dev->pdev->dev); > } > > static void bnxt_re_dev_remove(struct bnxt_re_dev *rdev) >diff --git a/drivers/infiniband/hw/cxgb4/provider.c >b/drivers/infiniband/hw/cxgb4/provider.c >index 4b76f2f3f4e4..5f4f3abf41e4 100644 >--- a/drivers/infiniband/hw/cxgb4/provider.c >+++ b/drivers/infiniband/hw/cxgb4/provider.c >@@ -570,7 +570,8 @@ void c4iw_register_device(struct work_struct >*work) > ret = set_netdevs(&dev->ibdev, &dev->rdev); > if (ret) > goto err_dealloc_ctx; >- ret = ib_register_device(&dev->ibdev, "cxgb4_%d"); >+ ret = ib_register_device(&dev->ibdev, "cxgb4_%d", >+ &dev->rdev.lldi.pdev->dev); > if (ret) > goto err_dealloc_ctx; > return; >diff --git a/drivers/infiniband/hw/efa/efa_main.c >b/drivers/infiniband/hw/efa/efa_main.c >index 92d701146320..4de5be3e1dfe 100644 >--- a/drivers/infiniband/hw/efa/efa_main.c >+++ b/drivers/infiniband/hw/efa/efa_main.c >@@ -331,7 +331,7 @@ static int efa_ib_device_add(struct efa_dev *dev) > > ib_set_device_ops(&dev->ibdev, &efa_dev_ops); > >- err = ib_register_device(&dev->ibdev, "efa_%d"); >+ err = ib_register_device(&dev->ibdev, "efa_%d", &pdev->dev); > if (err) > goto err_release_doorbell_bar; > >diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c >b/drivers/infiniband/hw/hns/hns_roce_main.c >index 2b4d75733e72..1b5f895d7daf 100644 >--- a/drivers/infiniband/hw/hns/hns_roce_main.c >+++ b/drivers/infiniband/hw/hns/hns_roce_main.c >@@ -547,7 +547,7 @@ static int hns_roce_register_device(struct >hns_roce_dev *hr_dev) > if (ret) > return ret; > } >- ret = ib_register_device(ib_dev, "hns_%d"); >+ ret = ib_register_device(ib_dev, "hns_%d", dev); > if (ret) { > dev_err(dev, "ib_register_device failed!\n"); > return ret; >diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c >b/drivers/infiniband/hw/i40iw/i40iw_verbs.c >index e53f6c0dc12e..945d30a86bbc 100644 >--- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c >+++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c >@@ -2748,7 +2748,7 @@ int i40iw_register_rdma_device(struct >i40iw_device *iwdev) > if (ret) > goto error; > >- ret = ib_register_device(&iwibdev->ibdev, "i40iw%d"); >+ ret = ib_register_device(&iwibdev->ibdev, "i40iw%d", >&iwdev->hw.pcidev->dev); > if (ret) > goto error; > >diff --git a/drivers/infiniband/hw/mlx4/main.c >b/drivers/infiniband/hw/mlx4/main.c >index 753c70402498..cd0fba6b0964 100644 >--- a/drivers/infiniband/hw/mlx4/main.c >+++ b/drivers/infiniband/hw/mlx4/main.c >@@ -2841,7 +2841,8 @@ static void *mlx4_ib_add(struct mlx4_dev *dev) > goto err_steer_free_bitmap; > > rdma_set_device_sysfs_group(&ibdev->ib_dev, &mlx4_attr_group); >- if (ib_register_device(&ibdev->ib_dev, "mlx4_%d")) >+ if (ib_register_device(&ibdev->ib_dev, "mlx4_%d", >+ &dev->persist->pdev->dev)) > goto err_diag_counters; > > if (mlx4_ib_mad_init(ibdev)) >diff --git a/drivers/infiniband/hw/mlx5/main.c >b/drivers/infiniband/hw/mlx5/main.c >index 3ae681a6ae3b..bca57c7661eb 100644 >--- a/drivers/infiniband/hw/mlx5/main.c >+++ b/drivers/infiniband/hw/mlx5/main.c >@@ -4404,7 +4404,7 @@ static int mlx5_ib_stage_ib_reg_init(struct >mlx5_ib_dev *dev) > name = "mlx5_%d"; > else > name = "mlx5_bond_%d"; >- return ib_register_device(&dev->ib_dev, name); >+ return ib_register_device(&dev->ib_dev, name, >&dev->mdev->pdev->dev); > } > > static void mlx5_ib_stage_pre_ib_reg_umr_cleanup(struct mlx5_ib_dev >*dev) >diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c >b/drivers/infiniband/hw/mthca/mthca_provider.c >index 31b558ff8218..c4d9cdc4ee97 100644 >--- a/drivers/infiniband/hw/mthca/mthca_provider.c >+++ b/drivers/infiniband/hw/mthca/mthca_provider.c >@@ -1206,7 +1206,7 @@ int mthca_register_device(struct mthca_dev >*dev) > mutex_init(&dev->cap_mask_mutex); > > rdma_set_device_sysfs_group(&dev->ib_dev, &mthca_attr_group); >- ret = ib_register_device(&dev->ib_dev, "mthca%d"); >+ ret = ib_register_device(&dev->ib_dev, "mthca%d", &dev->pdev->dev); > if (ret) > return ret; > >diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c >b/drivers/infiniband/hw/ocrdma/ocrdma_main.c >index d8c47d24d6d6..60416186f1d0 100644 >--- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c >+++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c >@@ -255,7 +255,8 @@ static int ocrdma_register_device(struct >ocrdma_dev *dev) > if (ret) > return ret; > >- return ib_register_device(&dev->ibdev, "ocrdma%d"); >+ return ib_register_device(&dev->ibdev, "ocrdma%d", >+ &dev->nic_info.pdev->dev); > } > > static int ocrdma_alloc_resources(struct ocrdma_dev *dev) >diff --git a/drivers/infiniband/hw/qedr/main.c >b/drivers/infiniband/hw/qedr/main.c >index 7c0aac3e635b..464becdd41f7 100644 >--- a/drivers/infiniband/hw/qedr/main.c >+++ b/drivers/infiniband/hw/qedr/main.c >@@ -293,7 +293,7 @@ static int qedr_register_device(struct qedr_dev >*dev) > if (rc) > return rc; > >- return ib_register_device(&dev->ibdev, "qedr%d"); >+ return ib_register_device(&dev->ibdev, "qedr%d", &dev->pdev->dev); > } > > /* This function allocates fast-path status block memory */ >diff --git a/drivers/infiniband/hw/usnic/usnic_ib_main.c >b/drivers/infiniband/hw/usnic/usnic_ib_main.c >index 462ed71abf53..6c23a5472168 100644 >--- a/drivers/infiniband/hw/usnic/usnic_ib_main.c >+++ b/drivers/infiniband/hw/usnic/usnic_ib_main.c >@@ -425,7 +425,7 @@ static void *usnic_ib_device_add(struct pci_dev >*dev) > if (ret) > goto err_fwd_dealloc; > >- if (ib_register_device(&us_ibdev->ib_dev, "usnic_%d")) >+ if (ib_register_device(&us_ibdev->ib_dev, "usnic_%d", &dev->dev)) > goto err_fwd_dealloc; > > usnic_fwd_set_mtu(us_ibdev->ufdev, us_ibdev->netdev->mtu); >diff --git a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c >b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c >index 780fd2dfc07e..5b2c94441125 100644 >--- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c >+++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_main.c >@@ -270,7 +270,7 @@ static int pvrdma_register_device(struct >pvrdma_dev *dev) > spin_lock_init(&dev->srq_tbl_lock); > rdma_set_device_sysfs_group(&dev->ib_dev, &pvrdma_attr_group); > >- ret = ib_register_device(&dev->ib_dev, "vmw_pvrdma%d"); >+ ret = ib_register_device(&dev->ib_dev, "vmw_pvrdma%d", >&dev->pdev->dev); > if (ret) > goto err_srq_free; > >diff --git a/drivers/infiniband/sw/rdmavt/vt.c >b/drivers/infiniband/sw/rdmavt/vt.c >index f904bb34477a..2f117ac11c8b 100644 >--- a/drivers/infiniband/sw/rdmavt/vt.c >+++ b/drivers/infiniband/sw/rdmavt/vt.c >@@ -581,7 +581,11 @@ int rvt_register_device(struct rvt_dev_info >*rdi) > spin_lock_init(&rdi->n_cqs_lock); > > /* DMA Operations */ >- rdi->ibdev.dev.dma_ops = rdi->ibdev.dev.dma_ops ? : &dma_virt_ops; >+ rdi->ibdev.dev.dma_ops = &dma_virt_ops; >+ rdi->ibdev.dev.dma_parms = rdi->ibdev.dev.parent->dma_parms; >+ rdi->ibdev.dev.dma_mask = rdi->ibdev.dev.parent->dma_mask; >+ rdi->ibdev.dev.coherent_dma_mask = >+ rdi->ibdev.dev.parent->coherent_dma_mask; > > /* Protection Domain */ > spin_lock_init(&rdi->n_pds_lock); >@@ -629,7 +633,7 @@ int rvt_register_device(struct rvt_dev_info *rdi) > rdi->ibdev.num_comp_vectors = 1; > > /* We are now good to announce we exist */ >- ret = ib_register_device(&rdi->ibdev, dev_name(&rdi->ibdev.dev)); >+ ret = ib_register_device(&rdi->ibdev, dev_name(&rdi->ibdev.dev), >NULL); > if (ret) { > rvt_pr_err(rdi, "Failed to register driver with ib core.\n"); > goto bail_wss; >diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c >b/drivers/infiniband/sw/rxe/rxe_verbs.c >index f368dc16281a..37fee72755be 100644 >--- a/drivers/infiniband/sw/rxe/rxe_verbs.c >+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c >@@ -1182,7 +1182,7 @@ int rxe_register_device(struct rxe_dev *rxe, >const char *ibdev_name) > rxe->tfm = tfm; > > rdma_set_device_sysfs_group(dev, &rxe_attr_group); >- err = ib_register_device(dev, ibdev_name); >+ err = ib_register_device(dev, ibdev_name, NULL); > if (err) > pr_warn("%s failed with error %d\n", __func__, err); > >diff --git a/drivers/infiniband/sw/siw/siw_main.c >b/drivers/infiniband/sw/siw/siw_main.c >index d862bec84376..0362d57b4db8 100644 >--- a/drivers/infiniband/sw/siw/siw_main.c >+++ b/drivers/infiniband/sw/siw/siw_main.c >@@ -69,7 +69,7 @@ static int siw_device_register(struct siw_device >*sdev, const char *name) > > sdev->vendor_part_id = dev_id++; > >- rv = ib_register_device(base_dev, name); >+ rv = ib_register_device(base_dev, name, NULL); > if (rv) { > pr_warn("siw: device registration error %d\n", rv); > return rv; >@@ -386,6 +386,8 @@ static struct siw_device >*siw_device_create(struct net_device *netdev) > base_dev->dev.dma_parms = &sdev->dma_parms; > sdev->dma_parms = (struct device_dma_parameters) > { .max_segment_size = SZ_2G }; >+ dma_coerce_mask_and_coherent(&base_dev->dev, >+ dma_get_required_mask(&base_dev->dev)); Leon, can you please help me to understand this additional logic? Do we need to setup the DMA device for (software) RDMA devices which rely on dma_virt_ops in the end, or better leave it untouched? Thanks very much! Bernard.