On Tue, Mar 13, 2018 at 04:06:19PM +0200, Leon Romanovsky wrote: > From: Parav Pandit <parav@xxxxxxxxxxxx> > > AH attribute of the cm_id can be overwritten if LAP message is received > on CM request which is in progress. This bug got introduced to avoid > sleeping when spin lock is held as part of commit in Fixes tag. > > Therefore validate the cm_id state first and continue to perform AV > ah_attr initialization outside of the non blocking context. > > Fixes: 33f93e1ebcf5 ("IB/cm: Fix sleeping while spin lock is held") > Reviewed-by: Daniel Jurgens <danielj@xxxxxxxxxxxx> > Signed-off-by: Parav Pandit <parav@xxxxxxxxxxxx> > Signed-off-by: Leon Romanovsky <leon@xxxxxxxxxx> > drivers/infiniband/core/cm.c | 24 ++++++++++++++++-------- > 1 file changed, 16 insertions(+), 8 deletions(-) > > diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c > index c5cd1b3ffa54..c9b325040df3 100644 > +++ b/drivers/infiniband/core/cm.c > @@ -3168,12 +3168,6 @@ static int cm_lap_handler(struct cm_work *work) > if (!cm_id_priv) > return -EINVAL; > > - ret = cm_init_av_for_response(work->port, work->mad_recv_wc->wc, > - work->mad_recv_wc->recv_buf.grh, > - &cm_id_priv->av); > - if (ret) > - goto deref; > - > param = &work->cm_event.param.lap_rcvd; > memset(&work->path[0], 0, sizeof(work->path[1])); > cm_path_set_rec_type(work->port->cm_dev->ib_device, > @@ -3218,10 +3212,24 @@ static int cm_lap_handler(struct cm_work *work) > goto unlock; > } > > - cm_id_priv->id.lap_state = IB_CM_LAP_RCVD; > - cm_id_priv->tid = lap_msg->hdr.tid; > + /* It is safe to release spin lock while working on AV > + * initialization from path record. > + * While AV initialization in progress, refcount is taken > + * by cm_acquire_id() at the start, therefore it is safe to > + * release spin lock and re-acquire before updating state. > + */ > + spin_unlock_irq(&cm_id_priv->lock); > + ret = cm_init_av_for_response(work->port, work->mad_recv_wc->wc, > + work->mad_recv_wc->recv_buf.grh, > + &cm_id_priv->av); > + if (ret) > + goto deref; > + > cm_init_av_by_path(param->alternate_path, &cm_id_priv->alt_av, > cm_id_priv); > + spin_lock_irq(&cm_id_priv->lock); > + cm_id_priv->id.lap_state = IB_CM_LAP_RCVD; Oy, this is hard to check. The comment is not good enough.. Firstly this could only even begin to be safe if because cm_lap_handler is run single threaded through a work queue. Indeed every call to cm_init_av_for_response needs to be run through the work queue. But it just isn't.. I'm not sure the history but cm_init_av_by_path looks wrongly locked in other places too. It looks like it should always be called with the spinlock held and it isn't.. So we can call something like ib_send_cm_req from userspace and race it with the above code, and corrupt av. Not sure why ibv_send_cm_req doesn't hold the spinlock while calling cm_init_av_by_path, probably should. I think all the places that write to the av need to update the av under the spin lock. Looks like the best way to do it above is to write a temporary av to the stack and then re-acquire the lock and copy it to the priv ?? All places that write to av need checking.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html