From: Leon Romanovsky <leonro@xxxxxxxxxxxx> >From Jason: cm_id.state is a non-atomic value that must always be read and written under lock, or while the thread has the only pointer to the cm_id. Critically, during MAD handling the cm_id.state is used to control when MAD handlers can run, and in turn what data they can touch. Without locking, an assignment to state can immediately allow concurrent MAD handlers to execute, potentially creating a mess. Several of these cases only risk load/store tearing, but create very confusing code. For instance changing the state from IB_CM_IDLE to IB_CM_LISTEN doesn't allow any MAD handlers to run in either state, but a superficial audit would suggest that it is not locked properly. This loose methodology has allowed two bugs to creep in. After creating an ID the code did not lock the state transition, apparently mistakenly assuming that the new ID could not be used concurrently. However, the ID is immediately placed in the xarray and so a carefully crafted network sequence could trigger races with the unlocked stores. The main solution to many of these problems is to use the xarray to create a two stage add - the first reserves the ID and the second publishes the pointer. The second stage is either omitted entirely or moved after the newly created ID is setup. Where it is trivial to do so other places directly put the state manipulation under lock, or add an assertion that it is, in fact, under lock. This also removes a number of places where the state is being read under lock, then the lock dropped, reacquired and state tested again. There remain other issues related to missing locking on cm_id data. Thanks ------------------------------------------------------------------------ It is based on rdma-next + rdma-rc patch c14dfddbd869 ("RMDA/cm: Fix missing ib_cm_destroy_id() in ib_cm_insert_listen()") Jason Gunthorpe (15): RDMA/cm: Fix ordering of xa_alloc_cyclic() in ib_create_cm_id() RDMA/cm: Fix checking for allowed duplicate listens RDMA/cm: Remove a race freeing timewait_info RDMA/cm: Make the destroy_id flow more robust RDMA/cm: Simplify establishing a listen cm_id RDMA/cm: Read id.state under lock when doing pr_debug() RDMA/cm: Make it clear that there is no concurrency in cm_sidr_req_handler() RDMA/cm: Make it clearer how concurrency works in cm_req_handler() RDMA/cm: Add missing locking around id.state in cm_dup_req_handler RDMA/cm: Add some lockdep assertions for cm_id_priv->lock RDMA/cm: Allow ib_send_cm_dreq() to be done under lock RDMA/cm: Allow ib_send_cm_drep() to be done under lock RDMA/cm: Allow ib_send_cm_rej() to be done under lock RDMA/cm: Allow ib_send_cm_sidr_rep() to be done under lock RDMA/cm: Make sure the cm_id is in the IB_CM_IDLE state in destroy drivers/infiniband/core/cm.c | 732 ++++++++++++++++++++--------------- 1 file changed, 420 insertions(+), 312 deletions(-) -- 2.24.1