On 22/03/2018 22:04, Jason Gunthorpe wrote: > process_one_req() can race with rdma_addr_cancel(): > > CPU0 CPU1 > ==== ==== > process_one_work() > debug_work_deactivate(work); > process_one_req() > rdma_addr_cancel() > mutex_lock(&lock); > set_timeout(&req->work,..); > __queue_work() > debug_work_activate(work); > mutex_unlock(&lock); > > mutex_lock(&lock); > [..] > list_del(&req->list); > mutex_unlock(&lock); > [..] > > // ODEBUG explodes since the work is still queued. > kfree(req); > > Causing ODEBUG to detect the use after free: > > ODEBUG: free active (active state 0) object type: work_struct hint: process_one_req+0x0/0x6c0 include/net/dst.h:165 > WARNING: CPU: 0 PID: 79 at lib/debugobjects.c:291 debug_print_object+0x166/0x220 lib/debugobjects.c:288 > kvm: emulating exchange as write > Kernel panic - not syncing: panic_on_warn set ... > > CPU: 0 PID: 79 Comm: kworker/u4:3 Not tainted 4.16.0-rc6+ #361 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > Workqueue: ib_addr process_one_req > Call Trace: > __dump_stack lib/dump_stack.c:17 [inline] > dump_stack+0x194/0x24d lib/dump_stack.c:53 > panic+0x1e4/0x41c kernel/panic.c:183 > __warn+0x1dc/0x200 kernel/panic.c:547 > report_bug+0x1f4/0x2b0 lib/bug.c:186 > fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178 > fixup_bug arch/x86/kernel/traps.c:247 [inline] > do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296 > do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315 > invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986 > RIP: 0010:debug_print_object+0x166/0x220 lib/debugobjects.c:288 > RSP: 0000:ffff8801d966f210 EFLAGS: 00010086 > RAX: dffffc0000000008 RBX: 0000000000000003 RCX: ffffffff815acd6e > RDX: 0000000000000000 RSI: 1ffff1003b2cddf2 RDI: 0000000000000000 > RBP: ffff8801d966f250 R08: 0000000000000000 R09: 1ffff1003b2cddc8 > R10: ffffed003b2cde71 R11: ffffffff86f39a98 R12: 0000000000000001 > R13: ffffffff86f15540 R14: ffffffff86408700 R15: ffffffff8147c0a0 > __debug_check_no_obj_freed lib/debugobjects.c:745 [inline] > debug_check_no_obj_freed+0x662/0xf1f lib/debugobjects.c:774 > kfree+0xc7/0x260 mm/slab.c:3799 > process_one_req+0x2e7/0x6c0 drivers/infiniband/core/addr.c:592 > process_one_work+0xc47/0x1bb0 kernel/workqueue.c:2113 > worker_thread+0x223/0x1990 kernel/workqueue.c:2247 > kthread+0x33c/0x400 kernel/kthread.c:238 > ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406 > > Fixes: 5fff41e1f89d ("IB/core: Fix race condition in resolving IP to MAC") > Reported-by: syzbot+3b4acab09b6463472d0a@xxxxxxxxxxxxxxxxxxxxxxxxx > Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxxxx> > --- > drivers/infiniband/core/addr.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > Leon, I took a look at this last bug you noted so we can get cleaned > up for the next kernel release. > > I didn't repo it, but I did confirm the C repo is calling > rdma_addr_cancel, so I think this is very likely to be the bug.. > > Parav/Mark: Does this make sense? > Yea, there is an issue with calling rdma_addr_cancel(). Parav has a fix to a different issue/symptoms. For now, this is the patch if anyone wants to have a look: https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/commit/?h=rdma-next&id=0118f2f11612e508f7b8ad003420f60536ffdb7d BTW, Dmitry thanks for doing such an awesome work!! > I'll hang on to this till Monday so you guys can check it. > > diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c > index b0a52c99620834..826b1edd9b58b6 100644 > --- a/drivers/infiniband/core/addr.c > +++ b/drivers/infiniband/core/addr.c > @@ -586,6 +586,15 @@ static void process_one_req(struct work_struct *_work) > list_del(&req->list); > mutex_unlock(&lock); > > + /* > + * Although the work will normally have been canceled by the > + * workqueue, it can still be requeued as long as it is on the > + * req_list, so it could have been requeued before we grabbed &lock. > + * We need to cancel it after it is removed from req_list to really be > + * sure it is safe to free. > + */ > + cancel_delayed_work(&req->work); > + > req->callback(req->status, (struct sockaddr *)&req->src_addr, > req->addr, req->context); > put_client(req->client); > Mark -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html