On Mon, Mar 2, 2020 at 5:43 PM John Garry <john.garry@xxxxxxxxxx> wrote: > > > > Hi Sumit, > > > It is megaraid_sas driver bug. Driver does not freeup resources properly, when > > scsi_add_host() fails. Please try attached patch. > > Yeah, that looks to work. The driver gracefully failed to bind. Ok, I will send this patch to upstream. > > However we might have lots of memory leaks: Thanks for pointing it out John. I will look into these memory leaks and send patches with fix in the next few days. Thanks, Sumit > > > root@(none)$ echo scan > /sys/kernel/debug/kmemleak > root@(none)$ [ 140.585484] kmemleak: 259 new suspected memory leaks > (see /sys/kernel/debug/kmemleak) > [ 140.585484] kmemleak: 259 new suspected memory leaks (see > /sys/kernel/debug/kmemleak) > > root@(none)$ > root@(none)$ more /sys/kernel/debug/kmemleak > unreferenced object 0xffff0026b9184c00 (size 512): > comm "kworker/0:0", pid 5, jiffies 4294903201 (age 95.768s) > hex dump (first 32 bytes): > 60 00 00 00 61 00 00 00 62 00 00 00 63 00 00 00 `...a...b...c... > 64 00 00 00 65 00 00 00 66 00 00 00 67 00 00 00 d...e...f...g... > backtrace: > [<(____ptrval____)>] slab_post_alloc_hook+0x6c/0xa0 > [<(____ptrval____)>] __kmalloc+0x174/0x280 > [<(____ptrval____)>] megasas_probe_one+0x798/0x2878 > [<(____ptrval____)>] local_pci_probe+0x74/0xf0 > [<(____ptrval____)>] work_for_cpu_fn+0x2c/0x48 > [<(____ptrval____)>] process_one_work+0x488/0xc08 > [<(____ptrval____)>] worker_thread+0x330/0x5d0 > [<(____ptrval____)>] kthread+0x1c8/0x1d0 > [<(____ptrval____)>] ret_from_fork+0x10/0x18 > unreferenced object 0xffff0026b922c000 (size 4096): > comm "kworker/0:0", pid 5, jiffies 4294903201 (age 95.768s) > hex dump (first 32 bytes): > 00 00 21 b7 26 00 ff ff 00 00 9f ff 00 00 00 00 ..!.&........... > 00 10 22 10 00 a0 ff ff 00 00 00 00 00 00 00 00 .."............. > backtrace: > [<(____ptrval____)>] slab_post_alloc_hook+0x6c/0xa0 > [<(____ptrval____)>] kmem_cache_alloc_trace+0x140/0x228 > [<(____ptrval____)>] megasas_alloc_fusion_context+0x30/0x1b0 > [<(____ptrval____)>] megasas_probe_one+0x7d8/0x2878 > [<(____ptrval____)>] local_pci_probe+0x74/0xf0 > [<(____ptrval____)>] work_for_cpu_fn+0x2c/0x48 > [<(____ptrval____)>] process_one_work+0x488/0xc08 > [<(____ptrval____)>] worker_thread+0x330/0x5d0 > [<(____ptrval____)>] kthread+0x1c8/0x1d0 > [<(____ptrval____)>] ret_from_fork+0x10/0x18 > unreferenced object 0xffff0026b7013000 (size 2048): > comm "kworker/0:0", pid 5, jiffies 4294903512 (age 94.540s) > hex dump (first 32 bytes): > 00 58 18 b9 26 00 ff ff 00 5c 18 b9 26 00 ff ff .X..&....\..&... > 00 60 18 b9 26 00 ff ff 00 64 18 b9 26 00 ff ff .`..&....d..&... > backtrace: > [<(____ptrval____)>] slab_post_alloc_hook+0x6c/0xa0 > [<(____ptrval____)>] kmem_cache_alloc_trace+0x140/0x228 > root@(none)$ > > > Thanks, > John > > > > > Thanks, > > Sumit > >> > >> [ 62.516871] megasas: 07.713.01.00-rc1 > >> [ 62.526189] megaraid_sas 0000:08:00.0: Adding to iommu group 1 > >> [ 62.571790] megaraid_sas 0000:08:00.0: BAR:0x0 BAR's > >> base_addr(phys):0x0000080010000000 mapped virt_addr:0x(____ptrval____) > >> [ 62.571802] megaraid_sas 0000:08:00.0: FW now in Ready state > >> [ 62.583811] megaraid_sas 0000:08:00.0: 63 bit DMA mask and 63 bit > >> consistent mask > >> [ 62.602143] megaraid_sas 0000:08:00.0: firmware supports msix : (128) > >> [ 62.780250] megaraid_sas 0000:08:00.0: requested/available msix 128/128 > >> [ 62.794292] megaraid_sas 0000:08:00.0: current msix/online cpus : > >> (128/128) > >> [ 62.809011] megaraid_sas 0000:08:00.0: RDPQ mode : (enabled) > >> [ 62.820968] megaraid_sas 0000:08:00.0: Current firmware supports > >> maximum commands: 4077 LDIO threshold: 0 > >> [ 62.937043] megaraid_sas 0000:08:00.0: Configured max firmware > >> commands: 4076 > >> [ 63.509185] megaraid_sas 0000:08:00.0: Performance mode :Latency > >> [ 63.521906] megaraid_sas 0000:08:00.0: FW supports sync cache : Yes > >> [ 63.535148] megaraid_sas 0000:08:00.0: megasas_disable_intr_fusion is > >> called outbound_intr_mask:0x40000009 > >> [ 63.610607] megaraid_sas 0000:08:00.0: FW provided supportMaxExtLDs: > >> 1 max_lds: 64 > >> [ 63.626618] megaraid_sas 0000:08:00.0: controller type : MR(2048MB) > >> [ 63.639870] megaraid_sas 0000:08:00.0: Online Controller Reset(OCR) : > >> Enabled > >> [ 63.654945] megaraid_sas 0000:08:00.0: Secure JBOD support : Yes > >> [ 63.667661] megaraid_sas 0000:08:00.0: NVMe passthru support : Yes > >> [ 63.667672] megaraid_sas 0000:08:00.0: FW provided TM TaskAbort/Reset > >> timeout : 6 secs/60 secs > >> [ 63.698922] megaraid_sas 0000:08:00.0: JBOD sequence map support : Yes > >> [ 63.712715] megaraid_sas 0000:08:00.0: PCI Lane Margining support : No > >> [ 63.754764] megaraid_sas 0000:08:00.0: NVME page size : (4096) > >> [ 63.787258] megaraid_sas 0000:08:00.0: megasas_enable_intr_fusion is > >> called outbound_intr_mask:0x40000000 > >> [ 63.807485] megaraid_sas 0000:08:00.0: INIT adapter done > >> [ 63.822235] megaraid_sas 0000:08:00.0: pci id : > >> (0x1000)/(0x0016)/(0x19e5)/(0xd215) > >> [ 63.838652] megaraid_sas 0000:08:00.0: unevenspan support : no > >> [ 63.850980] megaraid_sas 0000:08:00.0: firmware crash dump : no > >> [ 63.863499] megaraid_sas 0000:08:00.0: JBOD sequence map : enabled > >> [ 63.877352] scsi host0: Avago SAS based MegaRAID driver > >> [ 63.890398] megaraid_sas 0000:08:00.0: Failed to add host from > >> megasas_io_attach 6802 > >> [ 63.906999] megaraid_sas 0000:08:00.0: megasas_disable_intr_fusion is > >> called outbound_intr_mask:0x40000009 > >> [ 64.591755] nvme 0000:81:00.0: Adding to iommu group 2 > >> [ 64.636476] nvme nvme0: pci function 0000:81:00.0 > >> [ 64.669635] libphy: Fixed MDIO Bus: probed > >> [ 64.680255] tun: Universal TUN/TAP device driver, 1.6 > >> [ 64.694422] thunder_xcv, ver 1.0 > >> [ 64.702042] thunder_bgx, ver 1.0 > >> [ 64.709277] nicpf, ver 1.0 > >> [ 64.718144] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k > >> [ 64.730402] e1000e: Copyright(c) 1999 - 2015 Intel Corporation. > >> [ 64.743337] igb: Intel(R) Gigabit Ethernet Network Driver - version > >> 5.6.0-k > >> [ 64.754981] nvme nvme0: Removing after probe failure status: -12 > >> [ 64.757953] igb: Copyright (c) 2007-2014 Intel Corporation. > >> [ 64.782805] igbvf: Intel(R) Gigabit Virtual Function Network Driver - > >> version 2.4.0-k > >> [ 64.799423] igbvf: Copyright (c) 2009 - 2012 Intel Corporation. > >> [ 64.813848] sky2: driver version 1.30 > >> [ 64.825564] VFIO - User Level meta-driver version: 0.3 > >> [ 64.848089] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver > >> [ 64.862029] ehci-pci: EHCI PCI platform driver > >> [ 64.873445] ehci-pci 0000:7a:01.0: Adding to iommu group 3 > >> [ 64.886700] > >> ================================================================== > >> [ 64.901999] BUG: KASAN: slab-out-of-bounds in > >> run_timer_softirq+0x6f4/0xae0 > >> [ 64.916663] Write of size 8 at addr ffff0026b931aae0 by task swapper/0/0 > >> > >> [ 64.933914] CPU: 0 PID: 0 Comm: swapper/0 Not tainted > >> 5.6.0-rc3-00005-g17ceebe3a05c-dirty #1775 > >> [ 64.952240] Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDD, > >> BIOS 2280-V2 CS V3.B160.01 02/24/2020 > >> [ 64.972575] Call trace: > >> [ 64.977729] dump_backtrace+0x0/0x298 > >> [ 64.985439] show_stack+0x14/0x20 > >> [ 64.992418] dump_stack+0x118/0x190 > >> [ 64.999762] print_address_description.isra.9+0x6c/0x3b8 > >> [ 65.010953] __kasan_report+0x134/0x23c > >> [ 65.019029] kasan_report+0xc/0x18 > >> [ 65.026188] __asan_store8+0x94/0xb8 > >> [ 65.033720] run_timer_softirq+0x6f4/0xae0 > >> [ 65.042343] efi_header_end+0x16c/0x840 > >> [ 65.050420] irq_exit+0x19c/0x1a8 > >> [ 65.057396] __handle_domain_irq+0x7c/0xe0 > >> [ 65.066022] gic_handle_irq+0x64/0x168 > >> [ 65.073917] el1_irq+0xbc/0x180 > >> [ 65.080528] arch_cpu_idle+0x3c/0x320 > >> [ 65.088239] default_idle_call+0x28/0x4c > >> [ 65.096502] do_idle+0x278/0x348 > >> [ 65.103295] cpu_startup_entry+0x24/0x40 > >> [ 65.111554] rest_init+0x1c4/0x298 > >> [ 65.118718] arch_call_rest_init+0xc/0x14 > >> [ 65.127159] start_kernel+0x848/0x888 > >> > >> [ 65.138006] Allocated by task 0: > >> [ 65.144802] (stack is not available) > >> > >> [ 65.155465] Freed by task 0: > >> [ 65.161530] (stack is not available) > >> > >> [ 65.172193] The buggy address belongs to the object at ffff0026b931aa00 > >> which belongs to the cache pool_workqueue of size 256 > >> [ 65.199113] The buggy address is located 224 bytes inside of > >> 256-byte region [ffff0026b931aa00, ffff0026b931ab00) > >> [ 65.223840] The buggy address belongs to the page: > >> [ 65.233931] page:fffffe009ac4c600 refcount:1 mapcount:0 > >> mapping:ffff0026dd81c880 index:0xffff0026b931fe00 compound_mapcount: 0 > >> [ 65.257923] flags: 0x6ffff00000010200(slab|head) > >> [ 65.267649] raw: 6ffff00000010200 fffffe009b20b208 fffffe009ac07608 > >> ffff0026dd81c880 > >> [ 65.283959] raw: ffff0026b931fe00 0000000000400002 00000001ffffffff > >> 0000000000000000 > >> [ 65.300270] page dumped because: kasan: bad access detected > >> > >> [ 65.315139] Memory state around the buggy address: > >> [ 65.325231] ffff0026b931a980: fc fc fc fc fc fc fc fc fc fc fc fc fc > >> fc fc fc > >> [ 65.340445] ffff0026b931aa00: fc fc fc fc fc fc fc fc fc fc fc fc fc > >> fc fc fc > >> [ 65.355660] >ffff0026b931aa80: fc fc fc fc fc fc fc fc fc fc fc fc fc > >> fc fc fc > >> [ 65.370870] ^ > >> [ 65.384256] ffff0026b931ab00: fc fc fc fc fc fc fc fc fc fc fc fc fc > >> fc fc fc > >> [ 65.399467] ffff0026b931ab80: fc fc fc fc fc fc fc fc fc fc fc fc fc > >> fc fc fc > >> [ 65.414675] > >> ================================================================== > >> [ 65.429885] Disabling lock debugging due to kernel taint > >> [ 65.441431] Unable to handle kernel paging request at virtual address > >> ffffa0001013c0b0 > >> [ 65.441695] ehci-pci 0000:7a:01.0: EHCI Host Controller > >> [ 65.458088] Mem abort info: > >> [ 65.469183] ehci-pci 0000:7a:01.0: new USB bus registered, assigned > >> bus number 1 > >> [ 65.474927] ESR = 0x96000007 > >> [ 65.491201] ehci-pci 0000:7a:01.0: irq 65, io mem 0x20c101000 > >> [ 65.496913] EC = 0x25: DABT (current EL), IL = 32 bits > >> [ 65.496918] SET = 0, FnV = 0 > >> [ 65.496922] EA = 0, S1PTW = 0 > >> [ 65.522586] ehci-pci 0000:7a:01.0: USB 0.0 started, EHCI 1.00 > >> [ 65.526575] Data abort info: > >> [ 65.526580] ISV = 0, ISS = 0x00000007 > >> [ 65.535948] hub 1-0:1.0: USB hub found > >> [ 65.545245] CM = 0, WnR = 0 > >> [ 65.545251] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000052530000 > >> [ 65.545256] [ffffa0001013c0b0] pgd=00002027fffff003, > >> pud=00002027ffffe003, pmd=00000026dda5b003, pte=0000000000000000 > >> [ 65.551519] hub 1-0:1.0: 2 ports detected > >> [ 65.559375] Internal error: Oops: 96000007 [#1] PREEMPT SMP > >> [ 65.559379] Modules linked in: > >> [ 65.569534] ehci-platform: EHCI generic platform driver > >> [ 65.573475] CPU: 34 PID: 8 Comm: kworker/u256:0 Tainted: G B > >> 5.6.0-rc3-00005-g17ceebe3a05c-dirty #1775 > >> [ 65.573477] Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDD, > >> BIOS 2280-V2 CS V3.B160.01 02/24/2020 > >> [ 65.573487] Workqueue: poll_megasas0_status megasas_fault_detect_work > >> [ 65.573492] pstate: 80c00009 (Nzcv daif +PAN +UAO) > >> [ 65.588048] ehci-orion: EHCI orion driver > >> [ 65.609756] pc : megasas_readl+0x60/0x80 > >> [ 65.609759] lr : megasas_readl+0x1c/0x80 > >> [ 65.609761] sp : ffff0026d97bfc00 > >> [ 65.609763] x29: ffff0026d97bfc00 x28: ffff0026d97a9890 > >> [ 65.609767] x27: ffff0026d97a0618 x26: ffff0026d97a9880 > >> [ 65.609771] x25: ffff0026d9758808 x24: ffff0026b931aa28 > >> [ 65.609775] x23: ffff0026b931aa98 x22: ffffa0002931e000 > >> [ 65.609779] x21: ffff0026dd898800 x20: ffff0026b931dcd8 > >> [ 65.618543] ehci-exynos: EHCI Exynos driver > >> [ 65.629840] x19: ffffa0001013c0b0 x18: 0000000000000000 > >> [ 65.629843] x17: 0000000000001d50 x16: ffffffffffffe240 > >> [ 65.629847] x15: 00000000000013a8 x14: 0000000000000000 > >> [ 65.629850] x13: 00000000000013a0 x12: 1fffe004db2f7f7c > >> [ 65.629854] x11: ffff8004db2f7f78 x10: dfffa00000000000 > >> [ 65.629857] x9 : ffffa00028f679e8 x8 : ffffa0002a483a48 > >> [ 65.629861] x7 : ffffa00026d5ed94 x6 : 0000000000000000 > >> [ 65.629864] x5 : ffffa0002a483a48 x4 : 0000000000000000 > >> [ 65.629868] x3 : ffffa000279df03c x2 : 0000000000000000 > >> [ 65.636662] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver > >> [ 65.647207] x1 : ef244e124d671400 x0 : 0000000000000004 > >> [ 65.647210] Call trace: > >> [ 65.647214] megasas_readl+0x60/0x80 > >> [ 65.647218] megasas_read_fw_status_reg_fusion+0x2c/0x38 > >> [ 65.647221] megasas_fault_detect_work+0x44/0x520 > >> [ 65.647226] process_one_work+0x488/0xc08 > >> [ 65.647228] worker_thread+0x68/0x5d0 > >> [ 65.647233] kthread+0x1c8/0x1d0 > >> [ 65.669535] ohci-pci: OHCI PCI platform driver > >> [ 65.689683] ret_from_fork+0x10/0x18 > >> [ 65.689689] Code: 54ffff09 a94153f3 a8c27bfd d65f03c0 (b9400260) > >> [ 65.689695] ---[ end trace 3632c7efc4f2d69c ]--- > >> > >> > >> That's 5.6-rc3 . > >> > >> Please have a look, > >> > >> John > >> > >> > >> >