On 4/23/2021 3:34 AM, Jason Gunthorpe wrote:
On Wed, Apr 21, 2021 at 02:40:37PM +0300, Leon Romanovsky wrote:
@@ -4396,6 +4439,14 @@ static void cm_remove_one(struct ib_device *ib_device, void *client_data)
cm_dev->going_down = 1;
spin_unlock_irq(&cm.lock);
+ list_for_each_entry_safe(cm_id_priv, tmp,
+ &cm_dev->cm_id_priv_list, cm_dev_list) {
+ if (!list_empty(&cm_id_priv->cm_dev_list))
+ list_del(&cm_id_priv->cm_dev_list);
+ cm_id_priv->av.port = NULL;
+ cm_id_priv->alt_av.port = NULL;
+ }
Ugh, this is in the wrong order, it has to be after the work queue
flush..
Hurm, I didn't see an easy way to fix it up, but I did think of a much
better design!
Generally speaking all we need is the memory of the cm_dev and port to
remain active, we don't need to block or fence with cm_remove_one(),
so just stick a memory kref on this thing and keep the memory. The
only things that needs to seralize with cm_remove_one() are on the
workqueue or take a spinlock (eg because they touch mad_agent)
Try this, I didn't finish every detail, applies on top of your series,
but you'll need to reflow it into new commits:
Thanks Jason, I think we still need a rwlock to protect "av->port"? It
is modified and cleared by cm_set_av_port() and read in many places.