> -----Original Message----- > From: Jason Gunthorpe > Sent: Sunday, March 25, 2018 9:59 PM > To: Leon Romanovsky <leon@xxxxxxxxxx> > Cc: Doug Ledford <dledford@xxxxxxxxxx>; Leon Romanovsky > <leonro@xxxxxxxxxxxx>; RDMA mailing list <linux-rdma@xxxxxxxxxxxxxxx>; > Daniel Jurgens <danielj@xxxxxxxxxxxx>; Mark Bloch <markb@xxxxxxxxxxxx>; > Matan Barak <matanb@xxxxxxxxxxxx>; Parav Pandit <parav@xxxxxxxxxxxx> > Subject: Re: [PATCH rdma-next 6/6] IB/core: Fix addr_req list delete corruption > due to re-enqueue > > On Sun, Mar 25, 2018 at 01:40:24PM +0300, Leon Romanovsky wrote: > > From: Parav Pandit <parav@xxxxxxxxxxxx> > > > > mod_delayed_work() may enqueue the work item even if it is getting > > executed while it is getting rescheduled via mod_delayed_work(). > > > > When process_one_req() is about to get executed, rdma_cancel_addr() > > modifies the delay to 0. > > > > This results into enqueueing the freed addr_req and work item again. > > Below trace shows that one request is getting executed twice results > > into list del corruption kernel crash. > > > > Therefore, let the request gets completed whoever sees it first; if > > process_one_req() sees that rdma_addr_cancel() already progressed to > > cancel the request, it ignore the work item processing. > > > > Cancel request waits for any existing work item to finish execution or > > cancel any past execution. This ensures that cancel routine doesn't > > free the request that workqueue might schedule. > > > > list_del corruption, 0000000009223578->prev is > > LIST_POISON2 (000000000f85575b) > > WARNING: CPU: 3 PID: 340 at lib/list_debug.c:50 > > __list_del_entry_valid+0x9a/0xa0 > > CPU: 3 PID: 340 Comm: kworker/u16:4 Not tainted 4.15.0-rc2-bfix+ #26 > > Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b > > 08/09/2016 > > Workqueue: ib_addr process_one_req [ib_core] > > task: 00000000e0ed2fa9 task.stack: 00000000e1ddfff5 > > RIP: 0010:__list_del_entry_valid+0x9a/0xa0 > > RSP: 0018:ffffc9000044fde0 EFLAGS: 00010282 > > RAX: 0000000000000000 RBX: ffff880f8b82c200 RCX: 0000000000000000 > > RDX: 0000000000000001 RSI: ffff88107fccd078 RDI: ffff88107fccd078 > > RBP: ffffc9000044fde0 R08: 0000000000000000 R09: 0000000000000000 > > R10: 0000000000000001 R11: 0000000000000000 R12: ffff880f8b82c200 > > R13: ffff880f8b82c210 R14: ffff880f8b82c340 R15: ffff880f8b82c338 > > FS: 0000000000000000(0000) GS:ffff88107fcc0000(0000) > > knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 00007ffec6001228 CR3: 0000000001c10004 CR4: 00000000003606e0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call > > Trace: > > __list_del_entry+0xd/0x30 [ib_core] > > process_one_req+0x9f/0x140 [ib_core] > > process_one_work+0x1e3/0x5f0 > > ? process_one_work+0x156/0x5f0 > > worker_thread+0x4d/0x3e0 > > kthread+0x14b/0x180 > > ? process_one_work+0x5f0/0x5f0 > > ? kthread_stop+0x2b0/0x2b0 > > ret_from_fork+0x24/0x30 > > > > Fixes: 5fff41e1f89d ("IB/core: Fix race condition in resolving IP to > > MAC") > > Reviewed-by: Mark Bloch <markb@xxxxxxxxxxxx> > > Signed-off-by: Parav Pandit <parav@xxxxxxxxxxxx> > > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx> > > --- > > drivers/infiniband/core/addr.c | 54 > > ++++++++++++++++++++++++++++++++---------- > > 1 file changed, 41 insertions(+), 13 deletions(-) > > As already discussed I will use the one line version instead of this. > > When I first glanced at this patch I thought it was creating a sensible fence in > cancel, but it turns out it doesn't even do that... It does by canceled_delayed_work_sync and invoking the callback too. Regardless, as we discussed, one liner patch looks fine and follow on patches that we discussed and reviewed in github at [1]. [1] https://github.com/jgunthorpe/linux/commits/cma-fix > > So the two accomplish exactly the same thing and the other patch is much more > understandable. > > I have a small series here that tidies this whoel flow up.. > > https://github.com/jgunthorpe/linux/commits/cma-fix > > Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html