On Thu, Jan 25, 2018 at 01:10:23PM -0700, Jason Gunthorpe wrote: > On Thu, Jan 25, 2018 at 09:44:36PM +0200, Leon Romanovsky wrote: > > > > + /* > > > > + * @kref: Protect destroy of the resource > > > > + */ > > > > + struct kref kref; > > > > > > Sticking a kref in a random structure that is itself not malloc'd is a > > > big red flag: > > > > It is not "rand", but embedded part of ib_qp,ib_pd and ib_cq. It is > > malloced as part of their creation. > > I mean the kref isn't in any way linked the lifetime of the malloc > that contains it. So it isn't really a kref, it is just a refcount. For now, yes, in the future no. IMHO it is the direction to manage RDMA objects. > > > > @@ -1746,6 +1756,11 @@ struct ib_qp { > > > + struct rdma_restrack_root res; > > > > > > > > > Not seeing how this works at all. > > > > > > qp does this: > > > > > > + rdma_restrack_del(&qp->res); > > > ret = qp->device->destroy_qp(qp); > > > > > > And for instance the mlx5 implementation of destroy_qp() does: > > > > > > int mlx5_ib_destroy_qp(struct ib_qp *qp) > > > { > > > struct mlx5_ib_qp *mqp = to_mqp(qp); > > > [..] > > > > > > kfree(mqp); > > > > > > So we kfree the kref containing memory outside the kref's release > > > function. Super big red flag. > > > > It is worth to look on the actual implementation, the > > rdma_restrack_del() doesn't do a lot: removes from the list and marks > > the resource as not valid. > > It doesn't even do that if the kref count is elevated. > > Which is the whole point, the rdma_restrack_del *doesn't* prevent the > use-after-free on the read side. I understand it and agree that it is the bug, which is very easy to fix. > > > > The best idea I've had to make this locking work is what I emailed you > > > a few days ago.. > > > > It is more or less the same idea, but needs to move > > down_write(&dev->res.rwsem) to be part of rdma_restrack_del and not > > release routine. It will cause to block the progress of release. > > That doesn't help - the kref is still totally non-functional as a kref: > > CPU0 CPU1 > > down_read() > rdma_restrack_get() > up_read() > down_write() > restrack_release() > up_write() > kfree(qp) > fill_res_qp_entry(qp) > // boom, use after free > > A kref is not a lock, and a kref that *doesn't* put kfree in the > release function is probably not a kref but a refcount. > > What I sent you wasn't remotely like this, it had two nested locks, > the outer mutex and a very special open coded inner read/write lock > that doesn't deadlock when nested.. I still didn't like your approach, because it has three words which i don't want to see here: "special open-coded variant". I believe that the way to go is to add completion structure and convert _del to be something like that: void rdma_restrack_del(struct rdma_restrack_entry *res) { if (!rdma_restrack_put(res)) /* it is pseudo-code */ wait_for_completion(!res->valid); return; } > > Jason > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html
Attachment:
signature.asc
Description: PGP signature