On Tue, Mar 21, 2023 at 09:54:58AM +0200, Leon Romanovsky wrote: > On Mon, Mar 20, 2023 at 04:18:14PM -0300, Jason Gunthorpe wrote: > > On Thu, Mar 16, 2023 at 03:39:27PM +0200, Leon Romanovsky wrote: > > > From: Patrisious Haddad <phaddad@xxxxxxxxxx> > > > > > > Previously when destroying a DCT, if the firmware function for the > > > destruction failed, the common resource would have been destroyed > > > either way, since it was destroyed before the firmware object. > > > Which leads to kernel warning "refcount_t: underflow" which indicates > > > possible use-after-free. > > > Which is triggered when we try to destroy the common resource for the > > > second time and execute refcount_dec_and_test(&common->refcount). > > > > > > So, currently before destroying the common resource we check its > > > refcount and continue with the destruction only if it isn't zero. > > > > This seems super sketchy > > > > If the destruction fails why not set the refcount back to 1? > > Because destruction will fail in destroy_rq_tracked() which is after > destroy_resource_common(). > > In first destruction attempt, we delete qp from radix tree and wait for all > reference to drop. In order do not undo all this logic (setting 1 alone is > not enough), it is much safer simply skip destroy_resource_common() in reentry > case. This is the bug I pointed a long time ago, it is ordered wrong to remove restrack before destruction is assured Jason