Re: [PATCH rdma-next v2 7/9] IB/cm: Clear all associated AV's ports when remove a cm device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 23, 2021 at 09:14:21PM +0800, Mark Zhang wrote:
> 
> 
> On 4/23/2021 3:34 AM, Jason Gunthorpe wrote:
> > On Wed, Apr 21, 2021 at 02:40:37PM +0300, Leon Romanovsky wrote:
> > > @@ -4396,6 +4439,14 @@ static void cm_remove_one(struct ib_device *ib_device, void *client_data)
> > >   	cm_dev->going_down = 1;
> > >   	spin_unlock_irq(&cm.lock);
> > > +	list_for_each_entry_safe(cm_id_priv, tmp,
> > > +				 &cm_dev->cm_id_priv_list, cm_dev_list) {
> > > +		if (!list_empty(&cm_id_priv->cm_dev_list))
> > > +			list_del(&cm_id_priv->cm_dev_list);
> > > +		cm_id_priv->av.port = NULL;
> > > +		cm_id_priv->alt_av.port = NULL;
> > > +	}
> > 
> > Ugh, this is in the wrong order, it has to be after the work queue
> > flush..
> > 
> > Hurm, I didn't see an easy way to fix it up, but I did think of a much
> > better design!
> > 
> > Generally speaking all we need is the memory of the cm_dev and port to
> > remain active, we don't need to block or fence with cm_remove_one(),
> > so just stick a memory kref on this thing and keep the memory. The
> > only things that needs to seralize with cm_remove_one() are on the
> > workqueue or take a spinlock (eg because they touch mad_agent)
> > 
> > Try this, I didn't finish every detail, applies on top of your series,
> > but you'll need to reflow it into new commits:
> 
> Thanks Jason, I think we still need a rwlock to protect "av->port"? It is
> modified and cleared by cm_set_av_port() and read in many places.

Hum..

This is a real mess.

It looks to me like any access to the av->port should always be
protected by the cm_id_priv->lock

Most already are, but the sets are wrong and a couple readers are wrong

Set reverse call chains:

cm_init_av_for_lap()
 cm_lap_handler(work) (ok)

cm_init_av_for_response()
 cm_req_handler(work) (OK, cm_id_priv is on stack)
 cm_sidr_req_handler(work) (OK, cm_id_priv is on stack)

cm_init_av_by_path()
 cm_req_handler(work) (OK, cm_id_priv is on stack)
 cm_lap_handler(work) (OK)
 ib_send_cm_req() (not locked)
   cma_connect_ib()
    rdma_connect_locked()
     [..]
   ipoib_cm_send_req()
   srp_send_req()
     srp_connect_ch()
      [..]
 ib_send_cm_sidr_req() (not locked)
  cma_resolve_ib_udp()
   rdma_connect_locked()

And
  cm_destroy_id() (locked)

And read reverse call chains:

cm_alloc_msg()
 ib_send_cm_req() (not locked)
 ib_send_cm_rep() (OK)
 ib_send_cm_rtu() (OK)
 cm_send_dreq_locked() (OK)
 cm_send_drep_locked() (OK)
 cm_send_rej_locked() (OK)
 ib_send_cm_mra() (OK)
 ib_send_cm_sidr_req() (not locked)
 cm_send_sidr_rep_locked() (OK)
cm_form_tid()
 cm_format_req()
  ib_send_cm_req() (sort of OK)
 cm_format_dreq()
   cm_send_dreq_locked (OK)
cm_format_req()
  ib_send_cm_req() (sort of OK)
cm_format_req_event()
 cm_req_handler() (OK, cm_id_priv is on stack)
cm_format_rep()
 ib_send_cm_rep() (OK)
cm_rep_handler(work) (OK)
cm_establish_handler(work) (OK)
cm_rtu_handler(work) (OK)
cm_send_dreq_locked() (OK)
cm_dreq_handler(work) (OK)
cm_drep_handler(work) (OK)
cm_rej_handler(work) (OK)
cm_mra_handler(work) (OK)
cm_apr_handler(work) (OK)
cm_sidr_rep_handler(work) (OK)
cm_init_qp_init_attr() (OK)
cm_init_qp_rtr_attr() (OK)
cm_init_qp_rts_attr() (OK)

So.. That leaves these functions that are not obviously locked
correctly:
 ib_send_cm_req()
 ib_send_cm_sidr_req()

And the way their locking expects to work is basically because they
expect that there are not parallel touches to the cm_id - however I'm
doubtful this is completely true.

So no new lock needed, but something should be done about the above
two functions, and this should be documented

Jason



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux