RE: [PATCH rdma-next v1] RDMA/restrack: Don't rely on uninitialized variable in restrack_add flow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Wed, Mar 14, 2018 at 09:20:39AM -0500, Steve Wise wrote:
> > >
> > > From: Leon Romanovsky <leonro@xxxxxxxxxxxx>
> > >
> > > The restrack code relies on the fact that object structures are zeroed
at
> > > the allocation stage, the mlx4 CQ wasn't allocated with kzalloc and it
> > > caused to the following crash.
> > >
> > > [  137.392209] general protection fault: 0000 [#1] SMP KASAN PTI
> > > [  137.392972] CPU: 0 PID: 622 Comm: ibv_rc_pingpong Tainted: G
W
> > > 4.16.0-rc1-00099-g00313983cda6 #11
> > > [  137.395079] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS
> > > 1.10.2-2.fc27 04/01/2014
> > > [  137.396866] RIP: 0010:rdma_restrack_del+0xc8/0xf0
> > > [  137.397762] RSP: 0018:ffff8801b54e7968 EFLAGS: 00010206
> > > [  137.399008] RAX: 0000000000000000 RBX: ffff8801d8bcbae8 RCX:
> > > ffffffffb82314df
> > > [  137.400055] RDX: dffffc0000000000 RSI: dffffc0000000000 RDI:
> > > 70696b533d454741
> > > [  137.401103] RBP: ffff8801d90c07a0 R08: ffff8801d8bcbb00 R09:
> > > 0000000000000000
> > > [  137.402470] R10: 0000000000000001 R11: ffffed0036a9cf52 R12:
> > > ffff8801d90c0ad0
> > > [  137.403318] R13: ffff8801d853fb20 R14: ffff8801d8bcbb28 R15:
> > > 0000000000000014
> > > [  137.404736] FS:  00007fb415d43740(0000) GS:ffff8801e5c00000(0000)
> > > knlGS:0000000000000000
> > > [  137.406074] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [  137.407101] CR2: 00007fb41557df20 CR3: 00000001b580c001 CR4:
> > > 00000000003606b0
> > > [  137.408308] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > 0000000000000000
> > > [  137.409352] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > > 0000000000000400
> > > [  137.410385] Call Trace:
> > > [  137.411058]  ib_destroy_cq+0x23/0x60
> > > [  137.411460]  uverbs_free_cq+0x37/0xa0
> > > [  137.412040]  remove_commit_idr_uobject+0x38/0xf0
> > > [  137.413042]  _rdma_remove_commit_uobject+0x5c/0x160
> > > [  137.413782]  ? lookup_get_idr_uobject+0x39/0x50
> > > [  137.414737]  rdma_remove_commit_uobject+0x3b/0x70
> > > [  137.415742]  ib_uverbs_destroy_cq+0x114/0x1d0
> > > [  137.416260]  ? ib_uverbs_req_notify_cq+0x160/0x160
> > > [  137.417073]  ? kernel_text_address+0x5c/0x90
> > > [  137.417805]  ? __kernel_text_address+0xe/0x30
> > > [  137.418766]  ? unwind_get_return_address+0x2f/0x50
> > > [  137.419558]  ib_uverbs_write+0x453/0x6a0
> > > [  137.420220]  ? show_ibdev+0x90/0x90
> > > [  137.420653]  ? __kasan_slab_free+0x136/0x180
> > > [  137.421155]  ? kmem_cache_free+0x78/0x1e0
> > > [  137.422192]  ? remove_vma+0x83/0x90
> > > [  137.422614]  ? do_munmap+0x447/0x6c0
> > > [  137.423045]  ? vm_munmap+0xb0/0x100
> > > [  137.423481]  ? SyS_munmap+0x1d/0x30
> > > [  137.424120]  ? do_syscall_64+0xeb/0x250
> > > [  137.424984]  ? entry_SYSCALL_64_after_hwframe+0x21/0x86
> > > [  137.425611]  ? lru_add_drain_all+0x270/0x270
> > > [  137.426116]  ? lru_add_drain_cpu+0xa3/0x170
> > > [  137.426616]  ? lru_add_drain+0x11/0x20
> > > [  137.427058]  ? free_pages_and_swap_cache+0xa6/0x120
> > > [  137.427672]  ? tlb_flush_mmu_free+0x78/0x90
> > > [  137.428168]  ? arch_tlb_finish_mmu+0x6d/0xb0
> > > [  137.428680]  __vfs_write+0xc4/0x350
> > > [  137.430917]  ? kernel_read+0xa0/0xa0
> > > [  137.432758]  ? remove_vma+0x90/0x90
> > > [  137.434781]  ? __kasan_slab_free+0x14b/0x180
> > > [  137.437486]  ? remove_vma+0x83/0x90
> > > [  137.439836]  ? kmem_cache_free+0x78/0x1e0
> > > [  137.442195]  ? percpu_counter_add_batch+0x1d/0x90
> > > [  137.444389]  vfs_write+0xf7/0x280
> > > [  137.446030]  SyS_write+0xa1/0x120
> > > [  137.447867]  ? SyS_read+0x120/0x120
> > > [  137.449670]  ? mm_fault_error+0x180/0x180
> > > [  137.451539]  ? _cond_resched+0x16/0x50
> > > [  137.453697]  ? SyS_read+0x120/0x120
> > > [  137.455883]  do_syscall_64+0xeb/0x250
> > > [  137.457686]  entry_SYSCALL_64_after_hwframe+0x21/0x86
> > > [  137.459595] RIP: 0033:0x7fb415637b94
> > > [  137.461315] RSP: 002b:00007ffdebea7d88 EFLAGS: 00000246 ORIG_RAX:
> > > 0000000000000001
> > > [  137.463879] RAX: ffffffffffffffda RBX: 00005565022d1bd0 RCX:
> > > 00007fb415637b94
> > > [  137.466519] RDX: 0000000000000018 RSI: 00007ffdebea7da0 RDI:
> > > 0000000000000003
> > > [  137.469543] RBP: 00007ffdebea7d98 R08: 0000000000000000 R09:
> > > 00005565022d40c0
> > > [  137.472479] R10: 00000000000009cf R11: 0000000000000246 R12:
> > > 00005565022d2520
> > > [  137.475125] R13: 00000000000003e8 R14: 0000000000000000 R15:
> > > 00007ffdebea7fd0
> > > [  137.477760] Code: f7 e8 dd 0d 0b ff 48 c7 43 40 00 00 00 00 48 89
df e8
> > 0d 0b
> > > 0b ff 48 8d 7b 28 c6 03 00 e8 41 0d 0b ff 48 8b 7b 28 48 85 ff 74 06
<f0>
> > ff 4f 48
> > > 74 10 5b 48 89 ef 5d 41 5c 41 5d 41 5e e9 32 b0 ee
> > > [  137.483375] RIP: rdma_restrack_del+0xc8/0xf0 RSP: ffff8801b54e7968
> > > [  137.486436] ---[ end trace 81835a1ea6722eed ]---
> > > [  137.488566] Kernel panic - not syncing: Fatal exception
> > > [  137.491162] Kernel Offset: 0x36000000 from 0xffffffff81000000
> > (relocation
> > > range: 0xffffffff80000000-0xffffffffbfffffff)
> > >
> > > Fixes: 00313983cda6 ("RDMA/nldev: provide detailed CM_ID information")
> > > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx>
> > > ---
> > >  drivers/infiniband/core/restrack.c | 5 ++---
> > >  1 file changed, 2 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/infiniband/core/restrack.c
> > > b/drivers/infiniband/core/restrack.c
> > > index e1d9934d6e81..91bf5df11608 100644
> > > --- a/drivers/infiniband/core/restrack.c
> > > +++ b/drivers/infiniband/core/restrack.c
> > > @@ -113,13 +113,12 @@ void rdma_restrack_add(struct
> rdma_restrack_entry
> > > *res)
> > >  	if (!dev)
> > >  		return;
> > >
> > > +	res->task = NULL;
> > >  	if (res_is_user(res)) {
> > > -		if (!res->task)
> > > -			rdma_restrack_set_task(res, current);
> > > +		rdma_restrack_set_task(res, current);
> > >  		res->kern_name = NULL;
> > >  	} else {
> > >  		set_kern_name(res);
> > > -		res->task = NULL;
> > >  	}
> > >
> > >  	kref_init(&res->kref);
> > > --
> > > 2.14.3
> >
> > This breaks cm_id pid/kern_name...  You just undid the change I added.
NAK.
> > The problem is that at the time the CMA adds the restrack resource, it
is
> > running in a workqueue context for some cases and that pid is incorrect.
> 
> Not exactly, I used the fact that later the RDMA-CM will overwrite ->task.
> 
> Before calling rdma_restrack_add, the code had one of two states:
> 1. res->task == NULL => no change in behaviour
> 2. res->task != NULL => before it skipped, now it will connect.
> 

Hmm...  Do you get the correct PIDs for user mode and kernel mode cm_ids?
Both active and passive sides of the connections?  Just run rping for the
user side, and nvmef or iser on the kernel side, and verify all the pids are
correct and/or kern_name's are correct.   If that is the case, then
NAK->ACK...


> What about such change on top of this fix?
> 
> diff --git a/drivers/infiniband/core/restrack.c
> b/drivers/infiniband/core/restrack.c
> index 91bf5df11608..2e6ae88e830b 100644
> --- a/drivers/infiniband/core/restrack.c
> +++ b/drivers/infiniband/core/restrack.c
> @@ -115,7 +115,9 @@ void rdma_restrack_add(struct rdma_restrack_entry
> *res)
> 
>          res->task = NULL;
>          if (res_is_user(res)) {
>  -               rdma_restrack_set_task(res, current);
>  +               /* it will be set later for RDMA-CM */
>  +               if (res->type != RDMA_RESTRACK_CM_ID)
>  +                       rdma_restrack_set_task(res,  current);
> 	                 res->kern_name = NULL;
> 	         } else {
> 	                 set_kern_name(res);
> 		 }
> 
> 

I think you would not want to set res->track to NULL for CM_ID restracks as
well.  But a solution that special cases CM_ID is ok with me.



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux