On Fri, Apr 08, 2022 at 12:30:35PM -0700, Saeed Mahameed wrote: > On 06 Apr 10:55, Leon Romanovsky wrote: > > On Tue, Apr 05, 2022 at 12:48:45PM -0700, Saeed Mahameed wrote: > > > On 05 Apr 11:12, Leon Romanovsky wrote: > > > > From: Patrisious Haddad <phaddad@xxxxxxxxxx> > > > > > > > > Prior to this patch in the case that destroy_unmap_eq() > > > > failed and was called again, it triggered an additional call of > > > > > > Where is it being failed and called again ? this shouldn't even be an > > > option, we try to keep mlx5 symmetrical, constructors and destructors are > > > supposed to be called only once in their respective positions. > > > the callers must be fixed to avoid re-entry, or change destructors to clear > > > up all resources even on failures, no matter what do not invent a reentry > > > protocols to mlx5 destructors. > > > > It can happen when QP is exposed through DEVX interface. In that flow, > > only FW knows about it and reference count all users. This means that > > attempt to destroy such QP will fail, but mlx5_core is structured in > > such way that all cleanup was done before calling to FW to get > > success/fail response. > > I wasn't talking about destroy_qp, actually destroy_qp is implemented the > way i am asking you to implement destroy_eq(); remove debugfs on first call > to destroy EQ, and drop the reentry logic from from mlx5_eq_destroy_generic > and destroy_async_eq. > > EQ is a core/mlx5_ib resources, it's not exposed to user nor DEVX, it > shouldn't be subject to DEVX limitations. I tend to agree with you. I'll take another look on it and resubmit. > > Also looking at the destroy_qp implementation, it removes the debugfs > unconditionally even if the QP has ref count and removal will fail in FW. > just FYI. Right, we don't care about debugfs. > > For EQ I don't even understand why devx can cause ODP EQ removal to fail.. > you must fix this at mlx5_ib layer, but for this patch, please drop the > re-entry and remove debugfs in destroy_eq, unconditionally. The reason to complexity is not debugfs, but an existence of "mlx5_frag_buf_free(dev, &eq->frag_buf);" line, after FW command is executed. We need to separate to two flows: the one that can tolerate FW cmd failures and the one that can't. If you don't add "reentry" flag, you can (theoretically) find yourself leaking ->frag_buf in the flows that don't know how to reentry. I'll resubmit. Thanks