On Tue, Mar 21, 2023 at 08:53:35AM -0300, Jason Gunthorpe wrote: > On Tue, Mar 21, 2023 at 09:54:58AM +0200, Leon Romanovsky wrote: > > On Mon, Mar 20, 2023 at 04:18:14PM -0300, Jason Gunthorpe wrote: > > > On Thu, Mar 16, 2023 at 03:39:27PM +0200, Leon Romanovsky wrote: > > > > From: Patrisious Haddad <phaddad@xxxxxxxxxx> > > > > > > > > Previously when destroying a DCT, if the firmware function for the > > > > destruction failed, the common resource would have been destroyed > > > > either way, since it was destroyed before the firmware object. > > > > Which leads to kernel warning "refcount_t: underflow" which indicates > > > > possible use-after-free. > > > > Which is triggered when we try to destroy the common resource for the > > > > second time and execute refcount_dec_and_test(&common->refcount). > > > > > > > > So, currently before destroying the common resource we check its > > > > refcount and continue with the destruction only if it isn't zero. > > > > > > This seems super sketchy > > > > > > If the destruction fails why not set the refcount back to 1? > > > > Because destruction will fail in destroy_rq_tracked() which is after > > destroy_resource_common(). > > > > In first destruction attempt, we delete qp from radix tree and wait for all > > reference to drop. In order do not undo all this logic (setting 1 alone is > > not enough), it is much safer simply skip destroy_resource_common() in reentry > > case. > > This is the bug I pointed a long time ago, it is ordered wrong to > remove restrack before destruction is assured It is not restrack, but internal to mlx5_core structure. 176 static void destroy_resource_common(struct mlx5_ib_dev *dev, 177 struct mlx5_core_qp *qp) 178 { 179 struct mlx5_qp_table *table = &dev->qp_table; 180 unsigned long flags; 181 .... 185 spin_lock_irqsave(&table->lock, flags); 186 radix_tree_delete(&table->tree, 187 qp->qpn | (qp->common.res << MLX5_USER_INDEX_LEN)); 188 spin_unlock_irqrestore(&table->lock, flags); 189 mlx5_core_put_rsc((struct mlx5_core_rsc_common *)qp); 190 wait_for_completion(&qp->common.free); 191 } > > Jason