Re: [PATCH] RDMA/cma: Avoid using invalid state during rdma_bind_addr()

Leon Romanovsky <leon@xxxxxxxxxx> · Wed, 16 May 2018 09:29:07 +0300

On Tue, May 15, 2018 at 08:51:15PM -0600, Jason Gunthorpe wrote:
> On Tue, May 15, 2018 at 04:41:00PM -0700, Roland Dreier wrote:
> > > This is basically what I was thinking as well..
> > >
> > > My fear is that we have lots and lots of places where the
> > > cma_comp_exch "lock" isn't right in this same sort of way, and
> > > everything has to be audited..
> >
> > OK, glad you like the approach.
> >
> > Yeah, I think at least any function that has a cma_comp_exch() to
> > unwind things in the error return path is suspect.  For example, can
> > rdma_connect() race against rdma_set_ib_path()?  I can take a more
> > comprehensive look tomorrow.
>
> The thing that gives me pause, is that I don't have a clear picture in
> my head how this this entire comp_exch based 'locking' scheme is even
> supposed to work. I just don't get it. It seems completely broken.
>
> Am I missing something? Did it start out OK and just get broken over
> the last decade?
>
> The answer matters, because if adding "ING"s states is the right
> design overall then, OK, but if the answer is this entire thing needs
> to go away and be replaced with a lock, then lets just do that
> instead.
>
> For instance, look at this..
>
> int rdma_set_ib_path(struct rdma_cm_id *id,
> 		     struct sa_path_rec *path_rec)
> {
> 	if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_RESOLVED,
> 			   RDMA_CM_ROUTE_RESOLVED))
> 		return -EINVAL;
>
> 	id->route.path_rec = kmemdup(path_rec, sizeof(*path_rec),
> 				     GFP_KERNEL);
> [..]
> err_free:
> 	kfree(id->route.path_rec);
> 	id->route.path_rec = NULL;
> err:
> 	cma_comp_exch(id_priv, RDMA_CM_ROUTE_RESOLVED, RDMA_CM_ADDR_RESOLVED);
>
> So this has the same error problem, but even on the success it is racy
> nightmare.
>
> eg I race the above between comp_exch and kmemdup with this:
>
> int rdma_connect(struct rdma_cm_id *id, struct rdma_conn_param *conn_param)
> {
> 	if (!cma_comp_exch(id_priv, RDMA_CM_ROUTE_RESOLVED, RDMA_CM_CONNECT))
> 		return -EINVAL;
>
> And then link with this:
>
> static int cma_sidr_rep_handler(struct ib_cm_id *cm_id,
> 				struct ib_cm_event *ib_event)
> {
> 	mutex_lock(&id_priv->handler_mutex);
> 	if (id_priv->state != RDMA_CM_CONNECT)
> 		goto out;
> [..]
> 		ib_init_ah_attr_from_path(id_priv->id.device,
> 					  id_priv->id.port_num,
> 					  id_priv->id.route.path_rec,
> 					  &event.param.ud.ah_attr);
>
> And go boom with a null pointer deref without requiring the error
> path, or I use the error path and get a more reliable NULL.
>
> Is there some subtle reason why these three calls can't happen
> concurrently? It was easy to find this combination, so I be there are
> lots more even if this particular avenue is blocked off.
>
> Basically, for what I can tell, the entire thing is total crazy - I
> don't think we can stomp syzkaller bugs one at a time and ever get
> anywhere when the entire FSM idea seems deeply flawed.
>
> This is why I didn't make a patch yet :)
>
> So, what is the solution here? How should this actually look? Is there
> some sensible design pattern with a multi-threaded comp_exch based FSM
> that I am not aware of?
>
> The most familiar answer is to replace comp_exch with a proper mutex
> across the entire state entry action and exit and forget about adding *INGS.
>
> Perhaps we could do that without too much invasion by doing something
> like this:
>
>    if (!cma_state_check_enter(id_priv, RDMA_CM_IDLE, RDMA_CM_ADDR_BOUND))
> [..]
>    cma_state_success(id_priv);
> err:
>    cma_state_rollback(id_priv);
>
> At least there is some kind of sane and auditable pattern for building
> this FSM.. I think I prefer this to adding a bunch of "ING"s..
>
> The semantic would be that no cma_state_check_enter can succeed during
> the period between enter and success/rollback. Then every single
> function that does cma_exch gets audited and gains a success or
> rollback on its exit path, forming a good try_lock around the entire
> state action block.
>
> At the very least this guarentees that the FSM actions and arcs are
> sane for concurrency - that just leaves auditing the non-arc methods..
>
> But, again, I don't know why it is like this, and I fear it will just
> cause other runtime bugs, as we now cause comp_exch to fail while any
> action is running, which opens a whole other avenue of auditing :(
>

If you want more food for brain, the idea of mixing struct work,
handler_mutex and regular paths is really exciting too.

1648 void rdma_destroy_id(struct rdma_cm_id *id)
1649 {
1650         struct rdma_id_private *id_priv;
1651         enum rdma_cm_state state;
1652
1653         id_priv = container_of(id, struct rdma_id_private, id);
1654         state = cma_exch(id_priv, RDMA_CM_DESTROYING);
1655         cma_cancel_operation(id_priv, state);
1656
1657         /*
1658          * Wait for any active callback to finish.  New callbacks will find
1659          * the id_priv state set to destroying and abort.
1660          */
1661         mutex_lock(&id_priv->handler_mutex);
1662         mutex_unlock(&id_priv->handler_mutex);

and

2407 static void cma_work_handler(struct work_struct *_work)
2408 {
2409         struct cma_work *work = container_of(_work, struct cma_work, work);
2410         struct rdma_id_private *id_priv = work->id;
2411         int destroy = 0;
2412
2413         mutex_lock(&id_priv->handler_mutex);
2414         if (!cma_comp_exch(id_priv, work->old_state, work->new_state))
2415                 goto out;
2416
2417         if (id_priv->id.event_handler(&id_priv->id, &work->event)) {
2418                 cma_exch(id_priv, RDMA_CM_DESTROYING);
2419                 destroy = 1;
2420         }
2421 out:
2422         mutex_unlock(&id_priv->handler_mutex);

most probably racy too in changing state to RDMA_CM_DESTROYING,
but I have no clue about design decision for that part of code.

Thanks

>
> Jason
Attachment:
signature.asc

Description: PGP signature