On Tue, May 15, 2018 at 08:51:15PM -0600, Jason Gunthorpe wrote: > On Tue, May 15, 2018 at 04:41:00PM -0700, Roland Dreier wrote: > > > This is basically what I was thinking as well.. > > > > > > My fear is that we have lots and lots of places where the > > > cma_comp_exch "lock" isn't right in this same sort of way, and > > > everything has to be audited.. > > > > OK, glad you like the approach. > > > > Yeah, I think at least any function that has a cma_comp_exch() to > > unwind things in the error return path is suspect. For example, can > > rdma_connect() race against rdma_set_ib_path()? I can take a more > > comprehensive look tomorrow. > > The thing that gives me pause, is that I don't have a clear picture in > my head how this this entire comp_exch based 'locking' scheme is even > supposed to work. I just don't get it. It seems completely broken. > > Am I missing something? Did it start out OK and just get broken over > the last decade? > > The answer matters, because if adding "ING"s states is the right > design overall then, OK, but if the answer is this entire thing needs > to go away and be replaced with a lock, then lets just do that > instead. > > For instance, look at this.. > > int rdma_set_ib_path(struct rdma_cm_id *id, > struct sa_path_rec *path_rec) > { > if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_RESOLVED, > RDMA_CM_ROUTE_RESOLVED)) > return -EINVAL; > > id->route.path_rec = kmemdup(path_rec, sizeof(*path_rec), > GFP_KERNEL); > [..] > err_free: > kfree(id->route.path_rec); > id->route.path_rec = NULL; > err: > cma_comp_exch(id_priv, RDMA_CM_ROUTE_RESOLVED, RDMA_CM_ADDR_RESOLVED); > > So this has the same error problem, but even on the success it is racy > nightmare. > > eg I race the above between comp_exch and kmemdup with this: > > int rdma_connect(struct rdma_cm_id *id, struct rdma_conn_param *conn_param) > { > if (!cma_comp_exch(id_priv, RDMA_CM_ROUTE_RESOLVED, RDMA_CM_CONNECT)) > return -EINVAL; > > And then link with this: > > static int cma_sidr_rep_handler(struct ib_cm_id *cm_id, > struct ib_cm_event *ib_event) > { > mutex_lock(&id_priv->handler_mutex); > if (id_priv->state != RDMA_CM_CONNECT) > goto out; > [..] > ib_init_ah_attr_from_path(id_priv->id.device, > id_priv->id.port_num, > id_priv->id.route.path_rec, > &event.param.ud.ah_attr); > > And go boom with a null pointer deref without requiring the error > path, or I use the error path and get a more reliable NULL. > > Is there some subtle reason why these three calls can't happen > concurrently? It was easy to find this combination, so I be there are > lots more even if this particular avenue is blocked off. > > Basically, for what I can tell, the entire thing is total crazy - I > don't think we can stomp syzkaller bugs one at a time and ever get > anywhere when the entire FSM idea seems deeply flawed. > > This is why I didn't make a patch yet :) > > So, what is the solution here? How should this actually look? Is there > some sensible design pattern with a multi-threaded comp_exch based FSM > that I am not aware of? > > The most familiar answer is to replace comp_exch with a proper mutex > across the entire state entry action and exit and forget about adding *INGS. > > Perhaps we could do that without too much invasion by doing something > like this: > > if (!cma_state_check_enter(id_priv, RDMA_CM_IDLE, RDMA_CM_ADDR_BOUND)) > [..] > cma_state_success(id_priv); > err: > cma_state_rollback(id_priv); > > At least there is some kind of sane and auditable pattern for building > this FSM.. I think I prefer this to adding a bunch of "ING"s.. > > The semantic would be that no cma_state_check_enter can succeed during > the period between enter and success/rollback. Then every single > function that does cma_exch gets audited and gains a success or > rollback on its exit path, forming a good try_lock around the entire > state action block. > > At the very least this guarentees that the FSM actions and arcs are > sane for concurrency - that just leaves auditing the non-arc methods.. > > But, again, I don't know why it is like this, and I fear it will just > cause other runtime bugs, as we now cause comp_exch to fail while any > action is running, which opens a whole other avenue of auditing :( > If you want more food for brain, the idea of mixing struct work, handler_mutex and regular paths is really exciting too. 1648 void rdma_destroy_id(struct rdma_cm_id *id) 1649 { 1650 struct rdma_id_private *id_priv; 1651 enum rdma_cm_state state; 1652 1653 id_priv = container_of(id, struct rdma_id_private, id); 1654 state = cma_exch(id_priv, RDMA_CM_DESTROYING); 1655 cma_cancel_operation(id_priv, state); 1656 1657 /* 1658 * Wait for any active callback to finish. New callbacks will find 1659 * the id_priv state set to destroying and abort. 1660 */ 1661 mutex_lock(&id_priv->handler_mutex); 1662 mutex_unlock(&id_priv->handler_mutex); and 2407 static void cma_work_handler(struct work_struct *_work) 2408 { 2409 struct cma_work *work = container_of(_work, struct cma_work, work); 2410 struct rdma_id_private *id_priv = work->id; 2411 int destroy = 0; 2412 2413 mutex_lock(&id_priv->handler_mutex); 2414 if (!cma_comp_exch(id_priv, work->old_state, work->new_state)) 2415 goto out; 2416 2417 if (id_priv->id.event_handler(&id_priv->id, &work->event)) { 2418 cma_exch(id_priv, RDMA_CM_DESTROYING); 2419 destroy = 1; 2420 } 2421 out: 2422 mutex_unlock(&id_priv->handler_mutex); most probably racy too in changing state to RDMA_CM_DESTROYING, but I have no clue about design decision for that part of code. Thanks > > Jason
Attachment:
signature.asc
Description: PGP signature