On Fri, 2018-01-12 at 08:23 +0200, Leon Romanovsky wrote: > On Thu, Jan 11, 2018 at 07:07:06PM +0000, Bart Van Assche wrote: > > On Thu, 2018-01-11 at 21:00 +0200, Leon Romanovsky wrote: > > > On Thu, Jan 11, 2018 at 04:02:33PM +0000, Bart Van Assche wrote: > > > > On Thu, 2018-01-11 at 08:22 +0200, Leon Romanovsky wrote: > > > > > The proposed patch definitely decreases the chance of races, but it is not fixing them. > > > > > There is a chance to have change in qp state immediately after your "if ..." check. > > > > > > > > Hello Leon, > > > > > > > > Please have a look at rxe_qp_error() and you will see that the patch I posted > > > > is a proper fix. In the scenario you described rxe_qp_error() will trigger a > > > > run of rxe_completer(). > > > > > > Bart, > > > > > > What am I missing? > > > > > > CPU1 CPU2 > > > if (unlikely.... > > > <--- > > > /* move the qp to the error state */ > > > void rxe_qp_error(struct rxe_qp *qp) > > > { > > > qp->req.state = QP_STATE_ERROR; > > > qp->resp.state = QP_STATE_ERROR; > > > qp->attr.qp_state = IB_QPS_ERR; > > > ---> > > > rxe_run_task(&qp->req.task, must_sched); > > > > > > > > > > > > It is more or less the same as without "if (unlikely..." > > > > Hello Leon, > > > > In the above the part of rxe_qp_error() that I was referring to in my e-mail > > is missing: > > > > if (qp_type(qp) == IB_QPT_RC) > > rxe_run_task(&qp->comp.task, 1); > > > But it is exactly where race exists, as long QP isn't protected, it can > switch CPUs and create race. Hello Leon, Can you clarify which race you are referring to? rxe_run_task() uses the tasklet mechanism and tasklets are guaranteed to run on at most one CPU at a time. See also the "Top and Bottom Halves" chapter in Linux Device Drivers, 3rd edition. See also the tasklet_schedule() implementation in <linux/interrupt.h> and in kernel/softirq.c. Thanks, Bart.