On Mon, Sep 07, 2020 at 03:09:12PM +0300, Leon Romanovsky wrote: > From: Leon Romanovsky <leonro@xxxxxxxxxx> > > Changelog: > v2: > * Rebased on top of the 524d8ffd07f0 > * Removed "udata" check in destroy flows > * Changed ib_free_cq to return early > * Used Jason's suggestion to implement "RDMA/mlx5: Issue FW command to destroy > SRQ on reentry" patch. > v1 > * Changed returned value in efa_destroy_ah() from EINVAL to EOPNOTSUPP > * https://lore.kernel.org/lkml/20200830084010.102381-1-leon@xxxxxxxxxx > v0: > * https://lore.kernel.org/lkml/20200824103247.1088464-1-leon@xxxxxxxxxx > > Hi, > > This series restores the ability to fail on destroy commands, due to the > fact that mlx5_ib DEVX implementation interleaved ib_core objects > with FW objects without sharing reference counters. > > In retrospect, every part of the mlx5_ib flow is correct. > > It started from IBTA which was written by HW engineers with HW in mind and > they allowed to fail in destruction. FW implemented it with symmetrical > interface like any other command and propagated error back to the kernel, > which forwarded it to the libibverbs and kernel ULPs. > > Libibverbs was designed with IBTA spec in hand putting destroy errors in > stone. Up till mlx5_ib DEVX, it worked well, because the IB verbs objects > are counted by the kernel and ib_core ensures that FW destroy will success > by managing various reference counters on such objects. > > The extension of the mlx5 driver changed this flow when allowed DEVX objects > that are not managed by ib_core to be interleaved with the ones under ib_core > responsibility. > > The drivers that want to implement DEVX flows must ensure that FW/HW > destroys are performed as early as possible before any other internal > cleanup. After HW destroys, drivers are not allowed to fail. > > This series includes two patches (WQ and "potential race") that will > require extra work in mlx5_ib, they both theoretical. WQ is not in use > in DEVX, but is needed to make interface symmetrical to other objects. > "Potential race" is in ULP flow that ensures that SRQ is destroyed in > proper order. > > Thanks > > Leon Romanovsky (9): > RDMA: Restore ability to fail on PD deallocate > RDMA: Restore ability to fail on AH destroy > RDMA/mlx5: Issue FW command to destroy SRQ on reentry > RDMA: Restore ability to fail on SRQ destroy > RDMA/core: Delete function indirection for alloc/free kernel CQ > RDMA: Allow fail of destroy CQ > RDMA: Change XRCD destroy return value > RDMA: Restore ability to return error for destroy WQ > RDMA: Make counters destroy symmetrical Thanks, applied to for-next with the changes I noted: diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c index b2381e01bf6345..35e5bbb44d3d8e 100644 --- a/drivers/infiniband/hw/mlx5/cq.c +++ b/drivers/infiniband/hw/mlx5/cq.c @@ -1031,15 +1031,14 @@ int mlx5_ib_destroy_cq(struct ib_cq *cq, struct ib_udata *udata) int ret; ret = mlx5_core_destroy_cq(dev->mdev, &mcq->mcq); - if (ret && udata) + if (ret) return ret; - if (udata) { + if (udata) destroy_cq_user(mcq, udata); - return 0; - } - destroy_cq_kernel(dev, mcq); - return ret; + else + destroy_cq_kernel(dev, mcq); + return 0; } static int is_equal_rsn(struct mlx5_cqe64 *cqe64, u32 rsn) diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 039f55fd067640..6dfdc13bc36395 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -5092,11 +5092,11 @@ int mlx5_ib_destroy_wq(struct ib_wq *wq, struct ib_udata *udata) int ret; ret = mlx5_core_destroy_rq_tracked(dev, &rwq->core_qp); - if (ret && udata) + if (ret) return ret; destroy_user_rq(dev, wq->pd, rwq, udata); kfree(rwq); - return ret; + return 0; } struct ib_rwq_ind_table *mlx5_ib_create_rwq_ind_table(struct ib_device *device, diff --git a/drivers/infiniband/hw/mlx5/srq.c b/drivers/infiniband/hw/mlx5/srq.c index 6789b8a6927467..e2f720eec1e18b 100644 --- a/drivers/infiniband/hw/mlx5/srq.c +++ b/drivers/infiniband/hw/mlx5/srq.c @@ -396,17 +396,14 @@ int mlx5_ib_destroy_srq(struct ib_srq *srq, struct ib_udata *udata) int ret; ret = mlx5_cmd_destroy_srq(dev, &msrq->msrq); - if (ret && udata) + if (ret) return ret; - if (udata) { + if (udata) destroy_srq_user(srq->pd, msrq, udata); - return 0; - } - - /* We are cleaning kernel resources anyway */ - destroy_srq_kernel(dev, msrq); - return ret; + else + destroy_srq_kernel(dev, msrq); + return 0; } void mlx5_ib_free_srq_wqe(struct mlx5_ib_srq *srq, int wqe_index) diff --git a/drivers/infiniband/hw/mlx5/srq_cmd.c b/drivers/infiniband/hw/mlx5/srq_cmd.c index 1a707c2d364c1f..db889ec3fd48e8 100644 --- a/drivers/infiniband/hw/mlx5/srq_cmd.c +++ b/drivers/infiniband/hw/mlx5/srq_cmd.c @@ -598,7 +598,7 @@ int mlx5_cmd_destroy_srq(struct mlx5_ib_dev *dev, struct mlx5_core_srq *srq) /* Delete entry, but leave index occupied */ tmp = xa_cmpxchg_irq(&table->array, srq->srqn, srq, XA_ZERO_ENTRY, 0); - if (WARN_ON(!tmp || tmp != srq) || xa_err(tmp)) + if (WARN_ON(tmp != srq)) return xa_err(tmp) ?: -EINVAL; err = destroy_srq_split(dev, srq); Jason