> > From: Leon Romanovsky <leonro@xxxxxxxxxxxx> > > The restrack code relies on the fact that object structures are zeroed at > the allocation stage, the mlx4 CQ wasn't allocated with kzalloc and it > caused to the following crash. > > [ 137.392209] general protection fault: 0000 [#1] SMP KASAN PTI > [ 137.392972] CPU: 0 PID: 622 Comm: ibv_rc_pingpong Tainted: G W > 4.16.0-rc1-00099-g00313983cda6 #11 > [ 137.395079] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > 1.10.2-2.fc27 04/01/2014 > [ 137.396866] RIP: 0010:rdma_restrack_del+0xc8/0xf0 > [ 137.397762] RSP: 0018:ffff8801b54e7968 EFLAGS: 00010206 > [ 137.399008] RAX: 0000000000000000 RBX: ffff8801d8bcbae8 RCX: > ffffffffb82314df > [ 137.400055] RDX: dffffc0000000000 RSI: dffffc0000000000 RDI: > 70696b533d454741 > [ 137.401103] RBP: ffff8801d90c07a0 R08: ffff8801d8bcbb00 R09: > 0000000000000000 > [ 137.402470] R10: 0000000000000001 R11: ffffed0036a9cf52 R12: > ffff8801d90c0ad0 > [ 137.403318] R13: ffff8801d853fb20 R14: ffff8801d8bcbb28 R15: > 0000000000000014 > [ 137.404736] FS: 00007fb415d43740(0000) GS:ffff8801e5c00000(0000) > knlGS:0000000000000000 > [ 137.406074] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 137.407101] CR2: 00007fb41557df20 CR3: 00000001b580c001 CR4: > 00000000003606b0 > [ 137.408308] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 137.409352] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > 0000000000000400 > [ 137.410385] Call Trace: > [ 137.411058] ib_destroy_cq+0x23/0x60 > [ 137.411460] uverbs_free_cq+0x37/0xa0 > [ 137.412040] remove_commit_idr_uobject+0x38/0xf0 > [ 137.413042] _rdma_remove_commit_uobject+0x5c/0x160 > [ 137.413782] ? lookup_get_idr_uobject+0x39/0x50 > [ 137.414737] rdma_remove_commit_uobject+0x3b/0x70 > [ 137.415742] ib_uverbs_destroy_cq+0x114/0x1d0 > [ 137.416260] ? ib_uverbs_req_notify_cq+0x160/0x160 > [ 137.417073] ? kernel_text_address+0x5c/0x90 > [ 137.417805] ? __kernel_text_address+0xe/0x30 > [ 137.418766] ? unwind_get_return_address+0x2f/0x50 > [ 137.419558] ib_uverbs_write+0x453/0x6a0 > [ 137.420220] ? show_ibdev+0x90/0x90 > [ 137.420653] ? __kasan_slab_free+0x136/0x180 > [ 137.421155] ? kmem_cache_free+0x78/0x1e0 > [ 137.422192] ? remove_vma+0x83/0x90 > [ 137.422614] ? do_munmap+0x447/0x6c0 > [ 137.423045] ? vm_munmap+0xb0/0x100 > [ 137.423481] ? SyS_munmap+0x1d/0x30 > [ 137.424120] ? do_syscall_64+0xeb/0x250 > [ 137.424984] ? entry_SYSCALL_64_after_hwframe+0x21/0x86 > [ 137.425611] ? lru_add_drain_all+0x270/0x270 > [ 137.426116] ? lru_add_drain_cpu+0xa3/0x170 > [ 137.426616] ? lru_add_drain+0x11/0x20 > [ 137.427058] ? free_pages_and_swap_cache+0xa6/0x120 > [ 137.427672] ? tlb_flush_mmu_free+0x78/0x90 > [ 137.428168] ? arch_tlb_finish_mmu+0x6d/0xb0 > [ 137.428680] __vfs_write+0xc4/0x350 > [ 137.430917] ? kernel_read+0xa0/0xa0 > [ 137.432758] ? remove_vma+0x90/0x90 > [ 137.434781] ? __kasan_slab_free+0x14b/0x180 > [ 137.437486] ? remove_vma+0x83/0x90 > [ 137.439836] ? kmem_cache_free+0x78/0x1e0 > [ 137.442195] ? percpu_counter_add_batch+0x1d/0x90 > [ 137.444389] vfs_write+0xf7/0x280 > [ 137.446030] SyS_write+0xa1/0x120 > [ 137.447867] ? SyS_read+0x120/0x120 > [ 137.449670] ? mm_fault_error+0x180/0x180 > [ 137.451539] ? _cond_resched+0x16/0x50 > [ 137.453697] ? SyS_read+0x120/0x120 > [ 137.455883] do_syscall_64+0xeb/0x250 > [ 137.457686] entry_SYSCALL_64_after_hwframe+0x21/0x86 > [ 137.459595] RIP: 0033:0x7fb415637b94 > [ 137.461315] RSP: 002b:00007ffdebea7d88 EFLAGS: 00000246 ORIG_RAX: > 0000000000000001 > [ 137.463879] RAX: ffffffffffffffda RBX: 00005565022d1bd0 RCX: > 00007fb415637b94 > [ 137.466519] RDX: 0000000000000018 RSI: 00007ffdebea7da0 RDI: > 0000000000000003 > [ 137.469543] RBP: 00007ffdebea7d98 R08: 0000000000000000 R09: > 00005565022d40c0 > [ 137.472479] R10: 00000000000009cf R11: 0000000000000246 R12: > 00005565022d2520 > [ 137.475125] R13: 00000000000003e8 R14: 0000000000000000 R15: > 00007ffdebea7fd0 > [ 137.477760] Code: f7 e8 dd 0d 0b ff 48 c7 43 40 00 00 00 00 48 89 df e8 0d 0b > 0b ff 48 8d 7b 28 c6 03 00 e8 41 0d 0b ff 48 8b 7b 28 48 85 ff 74 06 <f0> ff 4f 48 > 74 10 5b 48 89 ef 5d 41 5c 41 5d 41 5e e9 32 b0 ee > [ 137.483375] RIP: rdma_restrack_del+0xc8/0xf0 RSP: ffff8801b54e7968 > [ 137.486436] ---[ end trace 81835a1ea6722eed ]--- > [ 137.488566] Kernel panic - not syncing: Fatal exception > [ 137.491162] Kernel Offset: 0x36000000 from 0xffffffff81000000 (relocation > range: 0xffffffff80000000-0xffffffffbfffffff) > > Fixes: 00313983cda6 ("RDMA/nldev: provide detailed CM_ID information") > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx> > --- > drivers/infiniband/core/restrack.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/drivers/infiniband/core/restrack.c > b/drivers/infiniband/core/restrack.c > index e1d9934d6e81..91bf5df11608 100644 > --- a/drivers/infiniband/core/restrack.c > +++ b/drivers/infiniband/core/restrack.c > @@ -113,13 +113,12 @@ void rdma_restrack_add(struct rdma_restrack_entry > *res) > if (!dev) > return; > > + res->task = NULL; > if (res_is_user(res)) { > - if (!res->task) > - rdma_restrack_set_task(res, current); > + rdma_restrack_set_task(res, current); > res->kern_name = NULL; > } else { > set_kern_name(res); > - res->task = NULL; > } > > kref_init(&res->kref); > -- > 2.14.3 This breaks cm_id pid/kern_name... You just undid the change I added. NAK. The problem is that at the time the CMA adds the restrack resource, it is running in a workqueue context for some cases and that pid is incorrect. Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html