On 8/18/2020 3:49 PM, Leon Romanovsky wrote:
On Tue, Aug 18, 2020 at 08:50:57AM +0300, Kamal Heib wrote:
On Tue, Aug 18, 2020 at 09:48:43AM +0800, Zhu Yanjun wrote:
On 8/17/2020 6:12 AM, Kamal Heib wrote:
On Sat, Aug 15, 2020 at 02:58:45PM +0800, Zhu Yanjun wrote:
On 8/12/2020 7:14 PM, Kamal Heib wrote:
To avoid the following kernel panic when calling kmem_cache_create()
with a NULL pointer from pool_cache(),
What is the root cause of this kernel panic?
The kernel panic is triggered using the following command and it happen
because the cache is not getting initialized.
modprobe rdma_rxe add=eno1
Thanks,
Kamal
Zhu Yanjun
move the rxe_cache_init() to the
context of device creation.
BUG: unable to handle kernel NULL pointer dereference at 000000000000000b
PGD 0 P4D 0
Oops: 0000 [#1] SMP NOPTI
CPU: 4 PID: 8512 Comm: modprobe Kdump: loaded Not tainted 4.18.0-231.el8.x86_64 #1
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018
RIP: 0010:kmem_cache_alloc+0xd1/0x1b0
Code: 8b 57 18 45 8b 77 1c 48 8b 5c 24 30 0f 1f 44 00 00 5b 48 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 81 e3 00 00 10 00 75 0e 4d 89 fe <41> f6 47 0b 04 0f 84 6c ff ff ff 4c 89 ff e8 cc da 01 00 49 89 c6
RSP: 0018:ffffa2b8c773f9d0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000005
RDX: 0000000000000004 RSI: 00000000006080c0 RDI: 0000000000000000
RBP: ffff8ea0a8634fd0 R08: ffffa2b8c773f988 R09: 00000000006000c0
R10: 0000000000000000 R11: 0000000000000230 R12: 00000000006080c0
R13: ffffffffc0a97fc8 R14: 0000000000000000 R15: 0000000000000000
FS: 00007f9138ed9740(0000) GS:ffff8ea4ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000000000b CR3: 000000046d59a000 CR4: 00000000003406e0
Call Trace:
rxe_alloc+0xc8/0x160 [rdma_rxe]
rxe_get_dma_mr+0x25/0xb0 [rdma_rxe]
__ib_alloc_pd+0xcb/0x160 [ib_core]
ib_mad_init_device+0x296/0x8b0 [ib_core]
add_client_context+0x11a/0x160 [ib_core]
enable_device_and_get+0xdc/0x1d0 [ib_core]
ib_register_device+0x572/0x6b0 [ib_core]
? crypto_create_tfm+0x32/0xe0
? crypto_create_tfm+0x7a/0xe0
? crypto_alloc_tfm+0x58/0xf0
rxe_register_device+0x19d/0x1c0 [rdma_rxe]
rxe_net_add+0x3d/0x70 [rdma_rxe]
? dev_get_by_name_rcu+0x73/0x90
rxe_param_set_add+0xaf/0xc0 [rdma_rxe]
parse_args+0x179/0x370
? ref_module+0x1b0/0x1b0
load_module+0x135e/0x17e0
? ref_module+0x1b0/0x1b0
? __do_sys_init_module+0x13b/0x180
__do_sys_init_module+0x13b/0x180
do_syscall_64+0x5b/0x1a0
entry_SYSCALL_64_after_hwframe+0x65/0xca
RIP: 0033:0x7f9137ed296e
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Kamal Heib <kamalheib1@xxxxxxxxx>
---
drivers/infiniband/sw/rxe/rxe.c | 14 +++++++-------
drivers/infiniband/sw/rxe/rxe_pool.c | 3 +++
drivers/infiniband/sw/rxe/rxe_sysfs.c | 7 +++++++
3 files changed, 17 insertions(+), 7 deletions(-)
diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index 5642eefb4ba1..60d5086dd34d 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -318,6 +318,13 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev)
goto err;
}
+ /* initialize slab caches for managed objects */
+ err = rxe_cache_init();
+ if (err) {
+ pr_err("unable to init object pools\n");
+ goto err;
+ }
+
err = rxe_net_add(ibdev_name, ndev);
if (err) {
pr_err("failed to add %s\n", ndev->name);
@@ -336,13 +343,6 @@ static int __init rxe_module_init(void)
{
int err;
- /* initialize slab caches for managed objects */
- err = rxe_cache_init();
When modprobe rdma_rxe, rxe_module_init should be called. Then
rxe_cache_init should be also called.
Why does the above call trace occur?
Zhu Yanjun
As you can see in the call trace attached to the commit message, When
running the "modprobe rdma_rxe add=eno1" command the rxe_param_set_add()
is called before rxe_module_init() (without init the caches), so the
call trace occurs when trying to register the allocated rxe device from
the context of rxe_param_set_add() without initialize the caches.
I would expect the fix being in rxe_init() instead of putting calls to
rxe_cache_init() in all places.
I agree with you.
Is it possible to make rxe_module_init be called before rxe_param_set_add?
Thanks
Thanks