RE: [PATCH rdma-next 6/6] RDMA/cma: Protect cma dev list with lock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Jason Gunthorpe
> Sent: Monday, September 3, 2018 8:51 PM
> To: Dennis Dalessandro <dennis.dalessandro@xxxxxxxxx>
> Cc: Leon Romanovsky <leon@xxxxxxxxxx>; Doug Ledford
> <dledford@xxxxxxxxxx>; RDMA mailing list <linux-rdma@xxxxxxxxxxxxxxx>;
> Daniel Jurgens <danielj@xxxxxxxxxxxx>; Feras Daoud
> <ferasda@xxxxxxxxxxxx>; Maor Gottlieb <maorg@xxxxxxxxxxxx>;
> Muhammad Sammar <muhammads@xxxxxxxxxxxx>; Parav Pandit
> <parav@xxxxxxxxxxxx>; Sean Hefty <sean.hefty@xxxxxxxxx>
> Subject: Re: [PATCH rdma-next 6/6] RDMA/cma: Protect cma dev list with lock
> 
> On Tue, Aug 28, 2018 at 02:29:21PM -0400, Dennis Dalessandro wrote:
> > On 8/28/2018 2:14 PM, Leon Romanovsky wrote:
> > > On Tue, Aug 28, 2018 at 02:02:56PM -0400, Dennis Dalessandro wrote:
> > > > On 8/28/2018 1:50 PM, Leon Romanovsky wrote:
> > > > > On Tue, Aug 28, 2018 at 01:08:55PM -0400, Dennis Dalessandro wrote:
> > > > > > On 8/28/2018 7:45 AM, Leon Romanovsky wrote:
> > > > > > > From: Parav Pandit <parav@xxxxxxxxxxxx>
> > > > > > >
> > > > > > > When AF_IB addresses are used during rdma_resolve_addr(),
> > > > > > > currently lock is not held. A cma device can get removed
> > > > > > > while such list traversal is in progress which may lead to crash.
> > > > > > >
> > > > > > > Therefore, hold a lock while traversing the list which
> > > > > > > avoids such situation.
> > > > > > > This patch is not tested. It was discovered during code review.
> > > > > > >
> > > > > > > Cc: <stable@xxxxxxxxxxxxxxx> # 3.10
> > > > > > > Fixes: f17df3b0d ("RDMA/cma: Add support for AF_IB to
> > > > > > > rdma_resolve_addr()")
> > > > > > > Signed-off-by: Parav Pandit <parav@xxxxxxxxxxxx>
> > > > > > > Reviewed-by: Daniel Jurgens <danielj@xxxxxxxxxxxx>
> > > > > > > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx>
> > > > > >
> > > > > > Really? You are sending something to stable that hasn't been tested?
> > > > > >
> > > > >
> > > > > Why are you saying that?
> > > > >
> > > > > Thanks,
> > > > >
> > > >
> > > > I know I personally wouldn't let any patches from our drivers hit
> > > > the list without being tested, and certainly patches that touch
> > > > common code and may impact other drivers would definitely be tested.
> > > >
> > > > And I would guess this probably was tested or at least run through
> > > > your QA, in which case, just strike that sentence from the commit
> > > > message. We are trying to establish/maintain a good reputation
> > > > here for our subsystem and when the casual reader of the git log
> > > > comes along and sees this it tells them, hey RDMA folks take
> > > > anything! Not the message we want to send I don't think.
> > >
> > > I agree with you, the more proper commit message should be:
> > > "This patch is not tested. ..." -> "This patch doesn't come as an
> > > outcome of real crash, but from code review. It was tested as part
> > > of our regular QA submission process without simulation of pre vs. post
> specific flows.
> > > No new bugs were found by the QA after applying this patch."
> > >
> > > Is it better?
> >
> > Much better. I'd even be fine just dropping the sentence, but this
> > makes it clear why there isn't a stack trace or anything.
> 
> Well, it is a race, as a general rule, I would prefer to see a CPU ladder diagram
> explaining how the race occures so the assertion there is a race can actually be
> checked by someone..
> 
We can consider below ladder diagram.

In below ladder diagram as one example, when rdma_resolve_addr()
is done from one cpu, other cpu might remove the cma_dev entry from list
or deallocate the cma_dev.

               CPU0                                                CPU1
               ====                                                 ====
rdma_resolve_addr()
   cma_resolve_ib_dev()                               [..]
       // stale entry while traversing
       list_for_each()                                 cma_remove_one()
          cur_dev->device                          mutex_lock(&lock)
                                                                  list_del();
                                                                 mutex_unlock*&lock);
                                                                 cma_process_remove();

> Reproducing races is hard, so it is not surprising that a test case wasn't created.
> 
> Jason




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux