> -----Original Message----- > From: Jason Gunthorpe > Sent: Monday, September 3, 2018 8:51 PM > To: Dennis Dalessandro <dennis.dalessandro@xxxxxxxxx> > Cc: Leon Romanovsky <leon@xxxxxxxxxx>; Doug Ledford > <dledford@xxxxxxxxxx>; RDMA mailing list <linux-rdma@xxxxxxxxxxxxxxx>; > Daniel Jurgens <danielj@xxxxxxxxxxxx>; Feras Daoud > <ferasda@xxxxxxxxxxxx>; Maor Gottlieb <maorg@xxxxxxxxxxxx>; > Muhammad Sammar <muhammads@xxxxxxxxxxxx>; Parav Pandit > <parav@xxxxxxxxxxxx>; Sean Hefty <sean.hefty@xxxxxxxxx> > Subject: Re: [PATCH rdma-next 6/6] RDMA/cma: Protect cma dev list with lock > > On Tue, Aug 28, 2018 at 02:29:21PM -0400, Dennis Dalessandro wrote: > > On 8/28/2018 2:14 PM, Leon Romanovsky wrote: > > > On Tue, Aug 28, 2018 at 02:02:56PM -0400, Dennis Dalessandro wrote: > > > > On 8/28/2018 1:50 PM, Leon Romanovsky wrote: > > > > > On Tue, Aug 28, 2018 at 01:08:55PM -0400, Dennis Dalessandro wrote: > > > > > > On 8/28/2018 7:45 AM, Leon Romanovsky wrote: > > > > > > > From: Parav Pandit <parav@xxxxxxxxxxxx> > > > > > > > > > > > > > > When AF_IB addresses are used during rdma_resolve_addr(), > > > > > > > currently lock is not held. A cma device can get removed > > > > > > > while such list traversal is in progress which may lead to crash. > > > > > > > > > > > > > > Therefore, hold a lock while traversing the list which > > > > > > > avoids such situation. > > > > > > > This patch is not tested. It was discovered during code review. > > > > > > > > > > > > > > Cc: <stable@xxxxxxxxxxxxxxx> # 3.10 > > > > > > > Fixes: f17df3b0d ("RDMA/cma: Add support for AF_IB to > > > > > > > rdma_resolve_addr()") > > > > > > > Signed-off-by: Parav Pandit <parav@xxxxxxxxxxxx> > > > > > > > Reviewed-by: Daniel Jurgens <danielj@xxxxxxxxxxxx> > > > > > > > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx> > > > > > > > > > > > > Really? You are sending something to stable that hasn't been tested? > > > > > > > > > > > > > > > > Why are you saying that? > > > > > > > > > > Thanks, > > > > > > > > > > > > > I know I personally wouldn't let any patches from our drivers hit > > > > the list without being tested, and certainly patches that touch > > > > common code and may impact other drivers would definitely be tested. > > > > > > > > And I would guess this probably was tested or at least run through > > > > your QA, in which case, just strike that sentence from the commit > > > > message. We are trying to establish/maintain a good reputation > > > > here for our subsystem and when the casual reader of the git log > > > > comes along and sees this it tells them, hey RDMA folks take > > > > anything! Not the message we want to send I don't think. > > > > > > I agree with you, the more proper commit message should be: > > > "This patch is not tested. ..." -> "This patch doesn't come as an > > > outcome of real crash, but from code review. It was tested as part > > > of our regular QA submission process without simulation of pre vs. post > specific flows. > > > No new bugs were found by the QA after applying this patch." > > > > > > Is it better? > > > > Much better. I'd even be fine just dropping the sentence, but this > > makes it clear why there isn't a stack trace or anything. > > Well, it is a race, as a general rule, I would prefer to see a CPU ladder diagram > explaining how the race occures so the assertion there is a race can actually be > checked by someone.. > We can consider below ladder diagram. In below ladder diagram as one example, when rdma_resolve_addr() is done from one cpu, other cpu might remove the cma_dev entry from list or deallocate the cma_dev. CPU0 CPU1 ==== ==== rdma_resolve_addr() cma_resolve_ib_dev() [..] // stale entry while traversing list_for_each() cma_remove_one() cur_dev->device mutex_lock(&lock) list_del(); mutex_unlock*&lock); cma_process_remove(); > Reproducing races is hard, so it is not surprising that a test case wasn't created. > > Jason