>From kernel 4.14.97, the function rxe_cache_clean does not exist. This function is introduced in the following commit. " commit 6db21d8986e14e2e86573a3b055b05296188bd2c Author: Yuval Shaia <yuval.shaia@xxxxxxxxxx> Date: Sun Dec 9 15:53:49 2018 +0200 IB/rxe: Fix incorrect cache cleanup in error flow Array iterator stays at the same slot, fix it. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Yuval Shaia <yuval.shaia@xxxxxxxxxx> Reviewed-by: Bart Van Assche <bvanassche@xxxxxxx> Reviewed-by: Zhu Yanjun <yanjun.zhu@xxxxxxxxxx> Reviewed-by: Leon Romanovsky <leonro@xxxxxxxxxxxx> Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxxxx> " On Tue, Feb 11, 2020 at 4:33 PM Frank Huang <tigerinxm@xxxxxxxxx> wrote: > > This is the first time I meet this bug, haven't found the bug trigger yet. > > We will kill the process in some situation using kill -9. Would it cause that? > > Before this happens, there are some error report: > > Feb 11 04:24:55 kernel: rdma_rxe: no qp matches qpn 0x31f5 > Feb 11 04:24:55 kernel: rdma_rxe: no qp matches qpn 0x31f5 > Feb 11 04:24:55 kernel: rdma_rxe: no qp matches qpn 0x31f5 > Feb 11 04:24:55 kernel: rdma_rxe: no qp matches qpn 0x31f5 > Feb 11 04:24:55 kernel: rdma_rxe: no qp matches qpn 0x31f5 > Feb 11 04:24:55 kernel: rdma_rxe: no qp matches qpn 0x31f5 > Feb 11 04:24:55 kernel: rdma_rxe: no qp matches qpn 0x31f5 > Feb 11 04:24:55 kernel: rdma_rxe: no qp matches qpn 0x31f5 > Feb 11 04:24:55 kernel: rdma_rxe: no qp matches qpn 0x31f5 > Feb 11 04:24:55 kernel: rdma_rxe: no qp matches qpn 0x31f5 > > On Tue, Feb 11, 2020 at 3:42 PM Zhu Yanjun <zyjzyj2000@xxxxxxxxx> wrote: > > > > Can this bug be reproduced? > > > > Zhu Yanjun > > > > On Tue, Feb 11, 2020 at 3:32 PM Frank Huang <tigerinxm@xxxxxxxxx> wrote: > > > > > > Re-post the log , sorry for the format. > > > > > > Feb 11 14:17:31 kernel: > > > ============================================================================= > > > Feb 11 14:17:31 kernel: BUG rxe-qp (Tainted: G OE ): > > > Objects remaining in rxe-qp on __kmem_cache_shutdown() > > > Feb 11 14:17:31 kernel: > > > ----------------------------------------------------------------------------- > > > Feb 11 14:17:31 kernel: Disabling lock debugging due to kernel taint > > > Feb 11 14:17:31 kernel: INFO: Slab 0xfffff4c4b027a000 objects=16 > > > used=1 fp=0xffff96f3c9e83f00 flags=0x17ffffc0008100 > > > Feb 11 14:17:31 kernel: CPU: 27 PID: 25588 Comm: rmmod Tainted: G > > > B OE 4.14.97-.el7.centos.x86_64 #1 > > > Feb 11 14:17:31 kernel: Hardware name: /80010056, BIOS 4.1.15 03/28/2017 > > > Feb 11 14:17:31 kernel: Call Trace: > > > Feb 11 14:17:31 kernel: dump_stack+0x5a/0x7b > > > Feb 11 14:17:31 kernel: slab_err+0xb4/0xe0 > > > Feb 11 14:17:31 kernel: ? calibrate_delay+0x138/0x5f0 > > > Feb 11 14:17:31 kernel: ? on_each_cpu_mask+0x27/0x60 > > > Feb 11 14:17:31 kernel: ? on_each_cpu_cond+0xaf/0x140 > > > Feb 11 14:17:31 kernel: ? __kmalloc+0x179/0x200 > > > Feb 11 14:17:31 kernel: ? __kmem_cache_shutdown+0x194/0x3d0 > > > Feb 11 14:17:31 kernel: __kmem_cache_shutdown+0x1b4/0x3d0 > > > Feb 11 14:17:31 kernel: shutdown_cache+0x13/0x1b0 > > > Feb 11 14:17:31 kernel: kmem_cache_destroy+0x1e4/0x220 > > > Feb 11 14:17:31 kernel: rxe_cache_clean+0x41/0x60 [rdma_rxe] > > > Feb 11 14:17:31 kernel: rxe_module_exit+0xf/0x68 [rdma_rxe] > > > Feb 11 14:17:31 kernel: SyS_delete_module+0x175/0x270 > > > Feb 11 14:17:31 kernel: do_syscall_64+0x74/0x190 > > > Feb 11 14:17:31 kernel: entry_SYSCALL_64_after_hwframe+0x3d/0xa2 > > > Feb 11 14:17:31 kernel: RIP: 0033:0x7ff146d3f517 > > > Feb 11 14:17:31 kernel: RSP: 002b:00007ffd4b5c1598 EFLAGS: 00000202 > > > ORIG_RAX: 00000000000000b0 > > > Feb 11 14:17:31 kernel: RAX: ffffffffffffffda RBX: 0000000000d78280 > > > RCX: 00007ff146d3f517 > > > Feb 11 14:17:31 kernel: RDX: 00007ff146db3ca0 RSI: 0000000000000800 > > > RDI: 0000000000d782e8 > > > Feb 11 14:17:31 kernel: RBP: 0000000000000000 R08: 00007ff147008060 > > > R09: 00007ff146db3ca0 > > > Feb 11 14:17:31 kernel: R10: 00007ffd4b5c1020 R11: 0000000000000202 > > > R12: 00007ffd4b5c36ca > > > Feb 11 14:17:31 kernel: R13: 0000000000000000 R14: 0000000000d78280 > > > R15: 0000000000d78010 > > > Feb 11 14:17:31 kernel: INFO: Object 0xffff96f3c9e84ec0 @offset=20160 > > > Feb 11 14:17:31 kernel: kmem_cache_destroy rxe-qp: Slab cache still has objects > > > Feb 11 14:17:31 kernel: CPU: 27 PID: 25588 Comm: rmmod Tainted: G > > > B OE 4.14.97-.el7.centos.x86_64 #1 > > > Feb 11 14:17:31 kernel: Hardware name: /80010056, BIOS 4.1.15 03/28/2017 > > > Feb 11 14:17:31 kernel: Call Trace: > > > Feb 11 14:17:31 kernel: dump_stack+0x5a/0x7b > > > Feb 11 14:17:31 kernel: kmem_cache_destroy+0x203/0x220 > > > Feb 11 14:17:31 kernel: rxe_cache_clean+0x41/0x60 [rdma_rxe] > > > Feb 11 14:17:31 kernel: rxe_module_exit+0xf/0x68 [rdma_rxe] > > > Feb 11 14:17:31 kernel: SyS_delete_module+0x175/0x270 > > > Feb 11 14:17:31 kernel: do_syscall_64+0x74/0x190 > > > Feb 11 14:17:31 kernel: entry_SYSCALL_64_after_hwframe+0x3d/0xa2 > > > Feb 11 14:17:31 kernel: RIP: 0033:0x7ff146d3f517 > > > Feb 11 14:17:31 kernel: RSP: 002b:00007ffd4b5c1598 EFLAGS: 00000202 > > > ORIG_RAX: 00000000000000b0 > > > Feb 11 14:17:31 kernel: RAX: ffffffffffffffda RBX: 0000000000d78280 > > > RCX: 00007ff146d3f517 > > > Feb 11 14:17:31 kernel: RDX: 00007ff146db3ca0 RSI: 0000000000000800 > > > RDI: 0000000000d782e8 > > > > > > On Tue, Feb 11, 2020 at 3:09 PM Frank Huang <tigerinxm@xxxxxxxxx> wrote: > > > > > > > > Hi, All > > > > > > > > When I use the old version of rdma_rxe (kernel 4.14.97), There is a > > > > slab leak of qp, is it fixed in newest version? I found the commit > > > > history on kernel.org, have not found same issue with it? > > > > > > > > > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: > > > > ============================================================================= > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: BUG > > > > rxe-qp (Tainted: G OE ): Objects remaining in rxe-qp on > > > > __kmem_cache_shutdown() > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: > > > > ----------------------------------------------------------------------------- > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: Disabling > > > > lock debugging due to kernel taint > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: INFO: > > > > Slab 0xfffff4c4b027a000 objects=16 used=1 fp=0xffff96f3c9e83f00 > > > > flags=0x17ffffc0008100 > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: CPU: 27 > > > > PID: 25588 Comm: rmmod Tainted: G B OE > > > > 4.14.97-.el7.centos.x86_64 #1 > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: Hardware > > > > name: 80010056, BIOS 4.1.15 03/28/2017 > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: Call Trace: > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: > > > > dump_stack+0x5a/0x7b > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: slab_err+0xb4/0xe0 > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: ? > > > > calibrate_delay+0x138/0x5f0 > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: ? > > > > on_each_cpu_mask+0x27/0x60 > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: ? > > > > on_each_cpu_cond+0xaf/0x140 > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: ? > > > > __kmalloc+0x179/0x200 > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: ? > > > > __kmem_cache_shutdown+0x194/0x3d0 > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: > > > > __kmem_cache_shutdown+0x1b4/0x3d0 > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: > > > > shutdown_cache+0x13/0x1b0 > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: > > > > kmem_cache_destroy+0x1e4/0x220 > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: > > > > rxe_cache_clean+0x41/0x60 [rdma_rxe] > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: > > > > rxe_module_exit+0xf/0x68 [rdma_rxe] > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: > > > > SyS_delete_module+0x175/0x270 > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: > > > > do_syscall_64+0x74/0x190 > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: > > > > entry_SYSCALL_64_after_hwframe+0x3d/0xa2 > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RIP: > > > > 0033:0x7ff146d3f517 > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RSP: > > > > 002b:00007ffd4b5c1598 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0 > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RAX: > > > > ffffffffffffffda RBX: 0000000000d78280 RCX: 00007ff146d3f517 > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RDX: > > > > 00007ff146db3ca0 RSI: 0000000000000800 RDI: 0000000000d782e8 > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RBP: > > > > 0000000000000000 R08: 00007ff147008060 R09: 00007ff146db3ca0 > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: R10: > > > > 00007ffd4b5c1020 R11: 0000000000000202 R12: 00007ffd4b5c36ca > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: R13: > > > > 0000000000000000 R14: 0000000000d78280 R15: 0000000000d78010 > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: INFO: > > > > Object 0xffff96f3c9e84ec0 @offset=20160 > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: > > > > kmem_cache_destroy rxe-qp: Slab cache still has objects > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: CPU: 27 > > > > PID: 25588 Comm: rmmod Tainted: G B OE > > > > 4.14.97-.el7.centos.x86_64 #1 > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: Hardware > > > > name: 80010056, BIOS 4.1.15 03/28/2017 > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: Call Trace: > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: > > > > dump_stack+0x5a/0x7b > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: > > > > kmem_cache_destroy+0x203/0x220 > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: > > > > rxe_cache_clean+0x41/0x60 [rdma_rxe] > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: > > > > rxe_module_exit+0xf/0x68 [rdma_rxe] > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: > > > > SyS_delete_module+0x175/0x270 > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: > > > > do_syscall_64+0x74/0x190 > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: > > > > entry_SYSCALL_64_after_hwframe+0x3d/0xa2 > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RIP: > > > > 0033:0x7ff146d3f517 > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RSP: > > > > 002b:00007ffd4b5c1598 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0 > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RAX: > > > > ffffffffffffffda RBX: 0000000000d78280 RCX: 00007ff146d3f517 > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RDX: > > > > 00007ff146db3ca0 RSI: 0000000000000800 RDI: 0000000000d782e8 > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: RBP: > > > > 0000000000000000 R08: 00007ff147008060 R09: 00007ff146db3ca0 > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: R10: > > > > 00007ffd4b5c1020 R11: 0000000000000202 R12: 00007ffd4b5c36ca > > > > Feb 11 14:17:31 57c4c63f-e817-4e22-aec9-72bc376b757c kernel: R13: > > > > 0000000000000000 R14: 0000000000d78280 R15: 0000000000d78010