On Tue, Nov 19, 2019 at 02:28:54PM +0100, Daniel Wagner wrote: > On Tue, Nov 12, 2019 at 10:15:00PM -0500, Martin K. Petersen wrote: > > > While trying to understand what's going on in the Oops below I figured > > > that it could be the result of the invalid pointer access. The patch > > > still needs testing by our customer but indepent of this I think the > > > patch fixes a real bug. > > I was able to reproduce the same stack trace with this patch > applied... That is obviously bad. The good news, I have access to this > machine, so maybe I able to figure out what's the root cause of this > crash. Forgot to append the KASAN trace which points at the same place. Don't know if this is the same thing or not. [ 329.217804] ================================================================== [ 329.280494] BUG: KASAN: slab-out-of-bounds in lpfc_sli4_io_xri_aborted+0x29c/0x3c0 [lpfc] [ 329.351654] Read of size 8 at addr ffff88984f160000 by task kworker/77:1/488 [ 329.396559] nvme nvme3: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery" [ 329.412326] [ 329.412335] CPU: 77 PID: 488 Comm: kworker/77:1 Kdump: loaded Tainted: G E 5.4.0-rc1-default+ #3 [ 329.412338] Hardware name: HP ProLiant DL580 Gen9/ProLiant DL580 Gen9, BIOS U17 07/21/2019 [ 329.412414] Workqueue: lpfc_wq lpfc_sli4_hba_process_cq [lpfc] [ 329.428650] nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery" [ 329.765863] Call Trace: [ 329.765888] dump_stack+0x71/0xab [ 329.765967] ? lpfc_sli4_io_xri_aborted+0x29c/0x3c0 [lpfc] [ 329.765981] print_address_description.constprop.6+0x1b/0x2f0 [ 329.912961] ? lpfc_sli4_io_xri_aborted+0x29c/0x3c0 [lpfc] [ 329.913001] ? lpfc_sli4_io_xri_aborted+0x29c/0x3c0 [lpfc] [ 330.009190] __kasan_report+0x14e/0x192 [ 330.009255] ? lpfc_sli4_io_xri_aborted+0x29c/0x3c0 [lpfc] [ 330.009261] kasan_report+0xe/0x20 [ 330.120620] lpfc_sli4_io_xri_aborted+0x29c/0x3c0 [lpfc] [ 330.120660] lpfc_sli4_sp_handle_abort_xri_wcqe.isra.55+0x59/0x280 [lpfc] [ 330.226013] ? __update_load_avg_cfs_rq+0x244/0x470 [ 330.226052] ? lpfc_sli4_fp_handle_cqe+0x127/0x8e0 [lpfc] [ 330.226089] lpfc_sli4_fp_handle_cqe+0x127/0x8e0 [lpfc] [ 330.358896] ? lpfc_sli4_sp_handle_abort_xri_wcqe.isra.55+0x280/0x280 [lpfc] [ 330.358907] ? __switch_to_asm+0x40/0x70 [ 330.452995] ? __switch_to_asm+0x34/0x70 [ 330.452998] ? __switch_to_asm+0x40/0x70 [ 330.453000] ? __switch_to_asm+0x34/0x70 [ 330.453002] ? __switch_to_asm+0x40/0x70 [ 330.453005] ? __switch_to_asm+0x34/0x70 [ 330.453041] __lpfc_sli4_process_cq+0x1e1/0x470 [lpfc] [ 330.453078] ? lpfc_sli4_sp_handle_abort_xri_wcqe.isra.55+0x280/0x280 [lpfc] [ 330.728428] ? __switch_to_asm+0x40/0x70 [ 330.728466] __lpfc_sli4_hba_process_cq+0x88/0x1d0 [lpfc] [ 330.728503] ? lpfc_sli4_fp_handle_cqe+0x8e0/0x8e0 [lpfc] [ 330.855605] process_one_work+0x46e/0x7f0 [ 330.855610] worker_thread+0x69/0x6b0 [ 330.855615] ? process_one_work+0x7f0/0x7f0 [ 330.855620] kthread+0x1b3/0x1d0 [ 330.855624] ? kthread_create_worker_on_cpu+0xc0/0xc0 [ 330.855627] ret_from_fork+0x35/0x40 [ 330.855631] [ 330.855634] Allocated by task 5171: [ 330.855644] save_stack+0x19/0x80 [ 330.855650] __kasan_kmalloc.constprop.9+0xa0/0xd0 [ 331.175452] __kmalloc+0xfb/0x5d0 [ 331.175461] alloc_pipe_info+0xff/0x210 [ 331.175464] create_pipe_files+0x66/0x2e0 [ 331.175467] __do_pipe_flags+0x2c/0x100 [ 331.175470] do_pipe2+0x80/0x130 [ 331.175472] __x64_sys_pipe2+0x2b/0x30 [ 331.175486] do_syscall_64+0x73/0x230 [ 331.395309] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 331.395310] [ 331.395312] Freed by task 5171: [ 331.395317] save_stack+0x19/0x80 [ 331.395319] __kasan_slab_free+0x105/0x150 [ 331.395321] kfree+0xa6/0x150 [ 331.395324] free_pipe_info+0x106/0x120 [ 331.395327] pipe_release+0xcb/0xf0 [ 331.395335] __fput+0x11d/0x330 [ 331.395338] task_work_run+0xc6/0xf0 [ 331.395344] exit_to_usermode_loop+0x11d/0x120 [ 331.730019] do_syscall_64+0x203/0x230 [ 331.730023] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 331.730023] [ 331.730027] The buggy address belongs to the object at ffff88984f160040 [ 331.730027] which belongs to the cache kmalloc-1k of size 1024 [ 331.730030] The buggy address is located 64 bytes to the left of [ 331.730030] 1024-byte region [ffff88984f160040, ffff88984f160440) [ 331.730031] The buggy address belongs to the page: [ 331.730036] page:ffffea00613c5800 refcount:1 mapcount:0 mapping:ffff888107c00700 index:0x0 compound_mapcount: 0 [ 331.730042] flags: 0x97ffffc0010200(slab|head) [ 331.730050] raw: 0097ffffc0010200 ffffea00613c4608 ffffea00613c7f88 ffff888107c00700 [ 332.266508] raw: 0000000000000000 ffff88984f160040 0000000100000007 0000000000000000 [ 332.266509] page dumped because: kasan: bad access detected [ 332.266510] [ 332.266511] Memory state around the buggy address: [ 332.266516] ffff88984f15ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 332.266518] ffff88984f15ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 332.266521] >ffff88984f160000: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb [ 332.266522] ^ [ 332.266525] ffff88984f160080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 332.266527] ffff88984f160100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 332.266528] ================================================================== The kernel I used to create the above KASAN trace is mkp/queue (clean without my patch), c0bf9a264e10 ("scsi: iscsi: Don't send data to unbound connection")