Hi Doug, Did you get a chance to merge this fix that avoids kernel crash in error scenario? Parav > -----Original Message----- > From: Leon Romanovsky [mailto:leon@xxxxxxxxxx] > Sent: Sunday, March 19, 2017 3:56 AM > To: Doug Ledford <dledford@xxxxxxxxxx> > Cc: linux-rdma@xxxxxxxxxxxxxxx; Parav Pandit <parav@xxxxxxxxxxxx>; # v4 . 2+ > <stable@xxxxxxxxxxxxxxx> > Subject: [PATCH rdma-next 1/3] IB/core: Fix kernel crash during fail to initialize > device > > From: Parav Pandit <parav@xxxxxxxxxxxx> > > This patch fixes the kernel crash that occurs during ib_dealloc_device() called > due to provider driver fails with an error after > ib_alloc_device() and before it can register using ib_register_device(). > > This crashed seen in tha lab as below which can occur with any IB device which > fails to perform its device initialization before invoking ib_register_device(). > > This patch avoids touching cache and port immutable structures if device is not > yet initialized. > It also releases related memory when cache and port immutable data structure > initialization fails during register_device() state. > > [81416.561946] BUG: unable to handle kernel NULL pointer dereference at (null) > [81416.570340] IP: ib_cache_release_one+0x29/0x80 [ib_core] [81416.576222] > PGD 78da66067 [81416.576223] PUD 7f2d7c067 [81416.579484] PMD 0 > [81416.582720] [81416.587242] Oops: 0000 [#1] SMP [81416.722395] task: > ffff8807887515c0 task.stack: ffffc900062c0000 [81416.729148] RIP: > 0010:ib_cache_release_one+0x29/0x80 [ib_core] [81416.735793] RSP: > 0018:ffffc900062c3a90 EFLAGS: 00010202 [81416.741823] RAX: > 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 > [81416.749785] RDX: 0000000000000000 RSI: 0000000000000282 RDI: > ffff880859fec000 [81416.757757] RBP: ffffc900062c3aa0 R08: > ffff8808536e5ac0 R09: ffff880859fec5b0 [81416.765708] R10: > 00000000536e5c01 R11: ffff8808536e5ac0 R12: ffff880859fec000 > [81416.773672] R13: 0000000000000000 R14: ffff8808536e5ac0 R15: > ffff88084ebc0060 [81416.781621] FS: 00007fd879fab740(0000) > GS:ffff88085fac0000(0000) knlGS:0000000000000000 [81416.790522] CS: 0010 > DS: 0000 ES: 0000 CR0: 0000000080050033 [81416.797094] CR2: > 0000000000000000 CR3: 00000007eb215000 CR4: 00000000003406e0 > [81416.805051] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 [81416.812997] DR3: 0000000000000000 DR6: > 00000000fffe0ff0 DR7: 0000000000000400 [81416.820950] Call Trace: > [81416.824226] ib_device_release+0x1e/0x40 [ib_core] [81416.829858] > device_release+0x32/0xa0 [81416.834370] kobject_cleanup+0x63/0x170 > [81416.839058] kobject_put+0x25/0x50 [81416.843319] > ib_dealloc_device+0x25/0x40 [ib_core] [81416.848986] > mlx5_ib_add+0x163/0x1990 [mlx5_ib] [81416.854414] > mlx5_add_device+0x5a/0x160 [mlx5_core] [81416.860191] > mlx5_register_interface+0x8d/0xc0 [mlx5_core] [81416.866587] ? > 0xffffffffa09e9000 [81416.870816] mlx5_ib_init+0x15/0x17 [mlx5_ib] > [81416.876094] do_one_initcall+0x51/0x1b0 [81416.880861] ? > __vunmap+0x85/0xd0 [81416.885113] ? > kmem_cache_alloc_trace+0x14b/0x1b0 > [81416.890768] ? vfree+0x2e/0x70 > [81416.894762] do_init_module+0x60/0x1fa [81416.899441] > load_module+0x15f6/0x1af0 [81416.904114] ? __symbol_put+0x60/0x60 > [81416.908709] ? ima_post_read_file+0x3d/0x80 [81416.913828] ? > security_kernel_post_read_file+0x6b/0x80 > [81416.920006] SYSC_finit_module+0xa6/0xf0 [81416.924888] > SyS_finit_module+0xe/0x10 [81416.929568] > entry_SYSCALL_64_fastpath+0x1a/0xa9 > [81416.935089] RIP: 0033:0x7fd879494949 > [81416.939543] RSP: 002b:00007ffdbc1b4e58 EFLAGS: 00000202 ORIG_RAX: > 0000000000000139 [81416.947982] RAX: ffffffffffffffda RBX: > 0000000001b66f00 RCX: 00007fd879494949 [81416.955965] RDX: > 0000000000000000 RSI: 000000000041a13c RDI: 0000000000000003 > [81416.963926] RBP: 0000000000000003 R08: 0000000000000000 R09: > 0000000001b652a0 [81416.971861] R10: 0000000000000003 R11: > 0000000000000202 R12: 00007ffdbc1b3e70 [81416.979763] R13: > 00007ffdbc1b3e50 R14: 0000000000000005 R15: 0000000000000000 > [81417.008005] RIP: ib_cache_release_one+0x29/0x80 [ib_core] RSP: > ffffc900062c3a90 [81417.016045] CR2: 0000000000000000 > > Fixes: 55aeed0654 ("IB/core: Make ib_alloc_device init the kobject") > Fixes: 7738613e7c ("IB/core: Add per port immutable struct to ib_device") > Cc: <stable@xxxxxxxxxxxxxxx> # v4.2+ > Reviewed-by: Daniel Jurgens <danielj@xxxxxxxxxxxx> > Signed-off-by: Parav Pandit <parav@xxxxxxxxxxxx> > Signed-off-by: Leon Romanovsky <leon@xxxxxxxxxx> > --- > drivers/infiniband/core/device.c | 33 ++++++++++++++++++++++----------- > 1 file changed, 22 insertions(+), 11 deletions(-) > > diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c > index 593d2ce6ec7c..64a2ae4d8eaa 100644 > --- a/drivers/infiniband/core/device.c > +++ b/drivers/infiniband/core/device.c > @@ -172,8 +172,16 @@ static void ib_device_release(struct device *device) { > struct ib_device *dev = container_of(device, struct ib_device, dev); > > - ib_cache_release_one(dev); > - kfree(dev->port_immutable); > + WARN_ON(dev->reg_state == IB_DEV_REGISTERED); > + if (dev->reg_state == IB_DEV_UNREGISTERED) { > + /* > + * In IB_DEV_UNINITIALIZED state, cache or port table > + * is not even created. Free cache and port table only when > + * device reaches UNREGISTERED state. > + */ > + ib_cache_release_one(dev); > + kfree(dev->port_immutable); > + } > kfree(dev); > } > > @@ -366,32 +374,27 @@ int ib_register_device(struct ib_device *device, > ret = ib_cache_setup_one(device); > if (ret) { > pr_warn("Couldn't set up InfiniBand P_Key/GID cache\n"); > - goto out; > + goto port_cleanup; > } > > ret = ib_device_register_rdmacg(device); > if (ret) { > pr_warn("Couldn't register device with rdma cgroup\n"); > - ib_cache_cleanup_one(device); > - goto out; > + goto cache_cleanup; > } > > memset(&device->attrs, 0, sizeof(device->attrs)); > ret = device->query_device(device, &device->attrs, &uhw); > if (ret) { > pr_warn("Couldn't query the device attributes\n"); > - ib_device_unregister_rdmacg(device); > - ib_cache_cleanup_one(device); > - goto out; > + goto cache_cleanup; > } > > ret = ib_device_register_sysfs(device, port_callback); > if (ret) { > pr_warn("Couldn't register device %s with driver model\n", > device->name); > - ib_device_unregister_rdmacg(device); > - ib_cache_cleanup_one(device); > - goto out; > + goto cache_cleanup; > } > > device->reg_state = IB_DEV_REGISTERED; @@ -403,6 +406,14 @@ int > ib_register_device(struct ib_device *device, > down_write(&lists_rwsem); > list_add_tail(&device->core_list, &device_list); > up_write(&lists_rwsem); > + mutex_unlock(&device_mutex); > + return 0; > + > +cache_cleanup: > + ib_cache_cleanup_one(device); > + ib_cache_release_one(device); > +port_cleanup: > + kfree(device->port_immutable); > out: > mutex_unlock(&device_mutex); > return ret; > -- > 2.12.0