Re: [PATCH] RDMA/rxe: Fix a race condition related to the QP error state

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jan 12, 2018 at 06:32:47AM +0000, Bart Van Assche wrote:
> On Fri, 2018-01-12 at 08:23 +0200, Leon Romanovsky wrote:
> > On Thu, Jan 11, 2018 at 07:07:06PM +0000, Bart Van Assche wrote:
> > > On Thu, 2018-01-11 at 21:00 +0200, Leon Romanovsky wrote:
> > > > On Thu, Jan 11, 2018 at 04:02:33PM +0000, Bart Van Assche wrote:
> > > > > On Thu, 2018-01-11 at 08:22 +0200, Leon Romanovsky wrote:
> > > > > > The proposed patch definitely decreases the chance of races, but it is not fixing them.
> > > > > > There is a chance to have change in qp state immediately after your "if ..." check.
> > > > >
> > > > > Hello Leon,
> > > > >
> > > > > Please have a look at rxe_qp_error() and you will see that the patch I posted
> > > > > is a proper fix. In the scenario you described rxe_qp_error() will trigger a
> > > > > run of rxe_completer().
> > > >
> > > > Bart,
> > > >
> > > > What am I missing?
> > > >
> > > > CPU1							CPU2
> > > > 							if (unlikely....
> > > > 						<---
> > > > /* move the qp to the error state */
> > > > void rxe_qp_error(struct rxe_qp *qp)
> > > > {
> > > >         qp->req.state = QP_STATE_ERROR;
> > > >         qp->resp.state = QP_STATE_ERROR;
> > > >         qp->attr.qp_state = IB_QPS_ERR;
> > > > 						--->
> > > > 							rxe_run_task(&qp->req.task, must_sched);
> > > >
> > > >
> > > >
> > > > It is more or less the same as without "if (unlikely..."
> > >
> > > Hello Leon,
> > >
> > > In the above the part of rxe_qp_error() that I was referring to in my e-mail
> > > is missing:
> > >
> > > 	if (qp_type(qp) == IB_QPT_RC)
> > > 		rxe_run_task(&qp->comp.task, 1);
> >
> >
> > But it is exactly where race exists, as long QP isn't protected, it can
> > switch CPUs and create race.
>
> Hello Leon,
>
> Can you clarify which race you are referring to? rxe_run_task() uses the
> tasklet mechanism and tasklets are guaranteed to run on at most one CPU at a
> time. See also the "Top and Bottom Halves" chapter in Linux Device Drivers,
> 3rd edition. See also the tasklet_schedule() implementation in
> <linux/interrupt.h> and in kernel/softirq.c.

Ahh, Bart, Sorry.

I found the cause of my misunderstanding, it is "comp" in the rxe_run_task
call and not "req".

Thanks

>
> Thanks,
>
> Bart.

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]