Re: megaraid_sas problem for scsi_add_host() fail

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Feb 28, 2020 at 12:11 AM John Garry <john.garry@xxxxxxxxxx> wrote:
>
> Hi guys,
>
Hi John,

> While do some testing, I noticed that the megaraid SAS driver has a
> problem and looks to corrupt my system when the call to scsi_add_host()
> in the driver fails:
It is megaraid_sas driver bug. Driver does not freeup resources properly, when
scsi_add_host() fails. Please try attached patch.

Thanks,
Sumit
>
> [   62.516871] megasas: 07.713.01.00-rc1
> [   62.526189] megaraid_sas 0000:08:00.0: Adding to iommu group 1
> [   62.571790] megaraid_sas 0000:08:00.0: BAR:0x0  BAR's
> base_addr(phys):0x0000080010000000  mapped virt_addr:0x(____ptrval____)
> [   62.571802] megaraid_sas 0000:08:00.0: FW now in Ready state
> [   62.583811] megaraid_sas 0000:08:00.0: 63 bit DMA mask and 63 bit
> consistent mask
> [   62.602143] megaraid_sas 0000:08:00.0: firmware supports msix : (128)
> [   62.780250] megaraid_sas 0000:08:00.0: requested/available msix 128/128
> [   62.794292] megaraid_sas 0000:08:00.0: current msix/online cpus :
> (128/128)
> [   62.809011] megaraid_sas 0000:08:00.0: RDPQ mode : (enabled)
> [   62.820968] megaraid_sas 0000:08:00.0: Current firmware supports
> maximum commands: 4077 LDIO threshold: 0
> [   62.937043] megaraid_sas 0000:08:00.0: Configured max firmware
> commands: 4076
> [   63.509185] megaraid_sas 0000:08:00.0: Performance mode :Latency
> [   63.521906] megaraid_sas 0000:08:00.0: FW supports sync cache : Yes
> [   63.535148] megaraid_sas 0000:08:00.0: megasas_disable_intr_fusion is
> called outbound_intr_mask:0x40000009
> [   63.610607] megaraid_sas 0000:08:00.0: FW provided supportMaxExtLDs:
> 1 max_lds: 64
> [   63.626618] megaraid_sas 0000:08:00.0: controller type : MR(2048MB)
> [   63.639870] megaraid_sas 0000:08:00.0: Online Controller Reset(OCR) :
> Enabled
> [   63.654945] megaraid_sas 0000:08:00.0: Secure JBOD support : Yes
> [   63.667661] megaraid_sas 0000:08:00.0: NVMe passthru support : Yes
> [   63.667672] megaraid_sas 0000:08:00.0: FW provided TM TaskAbort/Reset
> timeout : 6 secs/60 secs
> [   63.698922] megaraid_sas 0000:08:00.0: JBOD sequence map support : Yes
> [   63.712715] megaraid_sas 0000:08:00.0: PCI Lane Margining support : No
> [   63.754764] megaraid_sas 0000:08:00.0: NVME page size : (4096)
> [   63.787258] megaraid_sas 0000:08:00.0: megasas_enable_intr_fusion is
> called outbound_intr_mask:0x40000000
> [   63.807485] megaraid_sas 0000:08:00.0: INIT adapter done
> [   63.822235] megaraid_sas 0000:08:00.0: pci id :
> (0x1000)/(0x0016)/(0x19e5)/(0xd215)
> [   63.838652] megaraid_sas 0000:08:00.0: unevenspan support : no
> [   63.850980] megaraid_sas 0000:08:00.0: firmware crash dump : no
> [   63.863499] megaraid_sas 0000:08:00.0: JBOD sequence map : enabled
> [   63.877352] scsi host0: Avago SAS based MegaRAID driver
> [   63.890398] megaraid_sas 0000:08:00.0: Failed to add host from
> megasas_io_attach 6802
> [   63.906999] megaraid_sas 0000:08:00.0: megasas_disable_intr_fusion is
> called outbound_intr_mask:0x40000009
> [   64.591755] nvme 0000:81:00.0: Adding to iommu group 2
> [   64.636476] nvme nvme0: pci function 0000:81:00.0
> [   64.669635] libphy: Fixed MDIO Bus: probed
> [   64.680255] tun: Universal TUN/TAP device driver, 1.6
> [   64.694422] thunder_xcv, ver 1.0
> [   64.702042] thunder_bgx, ver 1.0
> [   64.709277] nicpf, ver 1.0
> [   64.718144] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
> [   64.730402] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
> [   64.743337] igb: Intel(R) Gigabit Ethernet Network Driver - version
> 5.6.0-k
> [   64.754981] nvme nvme0: Removing after probe failure status: -12
> [   64.757953] igb: Copyright (c) 2007-2014 Intel Corporation.
> [   64.782805] igbvf: Intel(R) Gigabit Virtual Function Network Driver -
> version 2.4.0-k
> [   64.799423] igbvf: Copyright (c) 2009 - 2012 Intel Corporation.
> [   64.813848] sky2: driver version 1.30
> [   64.825564] VFIO - User Level meta-driver version: 0.3
> [   64.848089] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
> [   64.862029] ehci-pci: EHCI PCI platform driver
> [   64.873445] ehci-pci 0000:7a:01.0: Adding to iommu group 3
> [   64.886700]
> ==================================================================
> [   64.901999] BUG: KASAN: slab-out-of-bounds in
> run_timer_softirq+0x6f4/0xae0
> [   64.916663] Write of size 8 at addr ffff0026b931aae0 by task swapper/0/0
>
> [   64.933914] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
> 5.6.0-rc3-00005-g17ceebe3a05c-dirty #1775
> [   64.952240] Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDD,
> BIOS 2280-V2 CS V3.B160.01 02/24/2020
> [   64.972575] Call trace:
> [   64.977729]  dump_backtrace+0x0/0x298
> [   64.985439]  show_stack+0x14/0x20
> [   64.992418]  dump_stack+0x118/0x190
> [   64.999762]  print_address_description.isra.9+0x6c/0x3b8
> [   65.010953]  __kasan_report+0x134/0x23c
> [   65.019029]  kasan_report+0xc/0x18
> [   65.026188]  __asan_store8+0x94/0xb8
> [   65.033720]  run_timer_softirq+0x6f4/0xae0
> [   65.042343]  efi_header_end+0x16c/0x840
> [   65.050420]  irq_exit+0x19c/0x1a8
> [   65.057396]  __handle_domain_irq+0x7c/0xe0
> [   65.066022]  gic_handle_irq+0x64/0x168
> [   65.073917]  el1_irq+0xbc/0x180
> [   65.080528]  arch_cpu_idle+0x3c/0x320
> [   65.088239]  default_idle_call+0x28/0x4c
> [   65.096502]  do_idle+0x278/0x348
> [   65.103295]  cpu_startup_entry+0x24/0x40
> [   65.111554]  rest_init+0x1c4/0x298
> [   65.118718]  arch_call_rest_init+0xc/0x14
> [   65.127159]  start_kernel+0x848/0x888
>
> [   65.138006] Allocated by task 0:
> [   65.144802] (stack is not available)
>
> [   65.155465] Freed by task 0:
> [   65.161530] (stack is not available)
>
> [   65.172193] The buggy address belongs to the object at ffff0026b931aa00
>   which belongs to the cache pool_workqueue of size 256
> [   65.199113] The buggy address is located 224 bytes inside of
>   256-byte region [ffff0026b931aa00, ffff0026b931ab00)
> [   65.223840] The buggy address belongs to the page:
> [   65.233931] page:fffffe009ac4c600 refcount:1 mapcount:0
> mapping:ffff0026dd81c880 index:0xffff0026b931fe00 compound_mapcount: 0
> [   65.257923] flags: 0x6ffff00000010200(slab|head)
> [   65.267649] raw: 6ffff00000010200 fffffe009b20b208 fffffe009ac07608
> ffff0026dd81c880
> [   65.283959] raw: ffff0026b931fe00 0000000000400002 00000001ffffffff
> 0000000000000000
> [   65.300270] page dumped because: kasan: bad access detected
>
> [   65.315139] Memory state around the buggy address:
> [   65.325231]  ffff0026b931a980: fc fc fc fc fc fc fc fc fc fc fc fc fc
> fc fc fc
> [   65.340445]  ffff0026b931aa00: fc fc fc fc fc fc fc fc fc fc fc fc fc
> fc fc fc
> [   65.355660] >ffff0026b931aa80: fc fc fc fc fc fc fc fc fc fc fc fc fc
> fc fc fc
> [   65.370870]                                                        ^
> [   65.384256]  ffff0026b931ab00: fc fc fc fc fc fc fc fc fc fc fc fc fc
> fc fc fc
> [   65.399467]  ffff0026b931ab80: fc fc fc fc fc fc fc fc fc fc fc fc fc
> fc fc fc
> [   65.414675]
> ==================================================================
> [   65.429885] Disabling lock debugging due to kernel taint
> [   65.441431] Unable to handle kernel paging request at virtual address
> ffffa0001013c0b0
> [   65.441695] ehci-pci 0000:7a:01.0: EHCI Host Controller
> [   65.458088] Mem abort info:
> [   65.469183] ehci-pci 0000:7a:01.0: new USB bus registered, assigned
> bus number 1
> [   65.474927]   ESR = 0x96000007
> [   65.491201] ehci-pci 0000:7a:01.0: irq 65, io mem 0x20c101000
> [   65.496913]   EC = 0x25: DABT (current EL), IL = 32 bits
> [   65.496918]   SET = 0, FnV = 0
> [   65.496922]   EA = 0, S1PTW = 0
> [   65.522586] ehci-pci 0000:7a:01.0: USB 0.0 started, EHCI 1.00
> [   65.526575] Data abort info:
> [   65.526580]   ISV = 0, ISS = 0x00000007
> [   65.535948] hub 1-0:1.0: USB hub found
> [   65.545245]   CM = 0, WnR = 0
> [   65.545251] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000052530000
> [   65.545256] [ffffa0001013c0b0] pgd=00002027fffff003,
> pud=00002027ffffe003, pmd=00000026dda5b003, pte=0000000000000000
> [   65.551519] hub 1-0:1.0: 2 ports detected
> [   65.559375] Internal error: Oops: 96000007 [#1] PREEMPT SMP
> [   65.559379] Modules linked in:
> [   65.569534] ehci-platform: EHCI generic platform driver
> [   65.573475] CPU: 34 PID: 8 Comm: kworker/u256:0 Tainted: G    B
>        5.6.0-rc3-00005-g17ceebe3a05c-dirty #1775
> [   65.573477] Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDD,
> BIOS 2280-V2 CS V3.B160.01 02/24/2020
> [   65.573487] Workqueue: poll_megasas0_status megasas_fault_detect_work
> [   65.573492] pstate: 80c00009 (Nzcv daif +PAN +UAO)
> [   65.588048] ehci-orion: EHCI orion driver
> [   65.609756] pc : megasas_readl+0x60/0x80
> [   65.609759] lr : megasas_readl+0x1c/0x80
> [   65.609761] sp : ffff0026d97bfc00
> [   65.609763] x29: ffff0026d97bfc00 x28: ffff0026d97a9890
> [   65.609767] x27: ffff0026d97a0618 x26: ffff0026d97a9880
> [   65.609771] x25: ffff0026d9758808 x24: ffff0026b931aa28
> [   65.609775] x23: ffff0026b931aa98 x22: ffffa0002931e000
> [   65.609779] x21: ffff0026dd898800 x20: ffff0026b931dcd8
> [   65.618543] ehci-exynos: EHCI Exynos driver
> [   65.629840] x19: ffffa0001013c0b0 x18: 0000000000000000
> [   65.629843] x17: 0000000000001d50 x16: ffffffffffffe240
> [   65.629847] x15: 00000000000013a8 x14: 0000000000000000
> [   65.629850] x13: 00000000000013a0 x12: 1fffe004db2f7f7c
> [   65.629854] x11: ffff8004db2f7f78 x10: dfffa00000000000
> [   65.629857] x9 : ffffa00028f679e8 x8 : ffffa0002a483a48
> [   65.629861] x7 : ffffa00026d5ed94 x6 : 0000000000000000
> [   65.629864] x5 : ffffa0002a483a48 x4 : 0000000000000000
> [   65.629868] x3 : ffffa000279df03c x2 : 0000000000000000
> [   65.636662] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
> [   65.647207] x1 : ef244e124d671400 x0 : 0000000000000004
> [   65.647210] Call trace:
> [   65.647214]  megasas_readl+0x60/0x80
> [   65.647218]  megasas_read_fw_status_reg_fusion+0x2c/0x38
> [   65.647221]  megasas_fault_detect_work+0x44/0x520
> [   65.647226]  process_one_work+0x488/0xc08
> [   65.647228]  worker_thread+0x68/0x5d0
> [   65.647233]  kthread+0x1c8/0x1d0
> [   65.669535] ohci-pci: OHCI PCI platform driver
> [   65.689683]  ret_from_fork+0x10/0x18
> [   65.689689] Code: 54ffff09 a94153f3 a8c27bfd d65f03c0 (b9400260)
> [   65.689695] ---[ end trace 3632c7efc4f2d69c ]---
>
>
> That's 5.6-rc3 .
>
> Please have a look,
>
> John
>
>
>

Attachment: 0001-megaraid_sas-Cleanup-resources-properly-when-scsi_ad.patch
Description: Binary data


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux