RE: [PATCH rdma-next 6/6] IB/core: Fix addr_req list delete corruption due to re-enqueue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Jason Gunthorpe
> Sent: Sunday, March 25, 2018 9:59 PM
> To: Leon Romanovsky <leon@xxxxxxxxxx>
> Cc: Doug Ledford <dledford@xxxxxxxxxx>; Leon Romanovsky
> <leonro@xxxxxxxxxxxx>; RDMA mailing list <linux-rdma@xxxxxxxxxxxxxxx>;
> Daniel Jurgens <danielj@xxxxxxxxxxxx>; Mark Bloch <markb@xxxxxxxxxxxx>;
> Matan Barak <matanb@xxxxxxxxxxxx>; Parav Pandit <parav@xxxxxxxxxxxx>
> Subject: Re: [PATCH rdma-next 6/6] IB/core: Fix addr_req list delete corruption
> due to re-enqueue
> 
> On Sun, Mar 25, 2018 at 01:40:24PM +0300, Leon Romanovsky wrote:
> > From: Parav Pandit <parav@xxxxxxxxxxxx>
> >
> > mod_delayed_work() may enqueue the work item even if it is getting
> > executed while it is getting rescheduled via mod_delayed_work().
> >
> > When process_one_req() is about to get executed, rdma_cancel_addr()
> > modifies the delay to 0.
> >
> > This results into enqueueing the freed addr_req and work item again.
> > Below trace shows that one request is getting executed twice results
> > into list del corruption kernel crash.
> >
> > Therefore, let the request gets completed whoever sees it first; if
> > process_one_req() sees that rdma_addr_cancel() already progressed to
> > cancel the request, it ignore the work item processing.
> >
> > Cancel request waits for any existing work item to finish execution or
> > cancel any past execution. This ensures that cancel routine doesn't
> > free the request that workqueue might schedule.
> >
> > list_del corruption, 0000000009223578->prev is
> > LIST_POISON2 (000000000f85575b)
> > WARNING: CPU: 3 PID: 340 at lib/list_debug.c:50
> > __list_del_entry_valid+0x9a/0xa0
> > CPU: 3 PID: 340 Comm: kworker/u16:4 Not tainted 4.15.0-rc2-bfix+ #26
> > Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b
> > 08/09/2016
> > Workqueue: ib_addr process_one_req [ib_core]
> > task: 00000000e0ed2fa9 task.stack: 00000000e1ddfff5
> > RIP: 0010:__list_del_entry_valid+0x9a/0xa0
> > RSP: 0018:ffffc9000044fde0 EFLAGS: 00010282
> > RAX: 0000000000000000 RBX: ffff880f8b82c200 RCX: 0000000000000000
> > RDX: 0000000000000001 RSI: ffff88107fccd078 RDI: ffff88107fccd078
> > RBP: ffffc9000044fde0 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000001 R11: 0000000000000000 R12: ffff880f8b82c200
> > R13: ffff880f8b82c210 R14: ffff880f8b82c340 R15: ffff880f8b82c338
> > FS:  0000000000000000(0000) GS:ffff88107fcc0000(0000)
> > knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00007ffec6001228 CR3: 0000000001c10004 CR4: 00000000003606e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call
> > Trace:
> > __list_del_entry+0xd/0x30 [ib_core]
> > process_one_req+0x9f/0x140 [ib_core]
> > process_one_work+0x1e3/0x5f0
> > ? process_one_work+0x156/0x5f0
> > worker_thread+0x4d/0x3e0
> > kthread+0x14b/0x180
> > ? process_one_work+0x5f0/0x5f0
> > ? kthread_stop+0x2b0/0x2b0
> > ret_from_fork+0x24/0x30
> >
> > Fixes: 5fff41e1f89d ("IB/core: Fix race condition in resolving IP to
> > MAC")
> > Reviewed-by: Mark Bloch <markb@xxxxxxxxxxxx>
> > Signed-off-by: Parav Pandit <parav@xxxxxxxxxxxx>
> > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx>
> > ---
> >  drivers/infiniband/core/addr.c | 54
> > ++++++++++++++++++++++++++++++++----------
> >  1 file changed, 41 insertions(+), 13 deletions(-)
> 
> As already discussed I will use the one line version instead of this.
> 
> When I first glanced at this patch I thought it was creating a sensible fence in
> cancel, but it turns out it doesn't even do that...
It does by canceled_delayed_work_sync and invoking the callback too.
Regardless, as we discussed, one liner patch looks fine and follow on patches that we discussed and reviewed in github at [1].

[1] https://github.com/jgunthorpe/linux/commits/cma-fix

> 
> So the two accomplish exactly the same thing and the other patch is much more
> understandable.
> 
> I have a small series here that tidies this whoel flow up..
> 
> https://github.com/jgunthorpe/linux/commits/cma-fix
> 
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux