Hi Suganath, Your proposed idea worked! I used the unmodified 5.8.12 kernel from my previous test, and just add the following kernel command line parameters: 'mpt3sas.logging_level=0x3f8 mpt3sas.max_queue_depth=10000'. All the drives in my SAS enclosure were detected and everything seems ot be working fine. I am attaching the full dmesg from this run. Thank you! On Wed, Sep 30, 2020 at 6:57 AM Suganath Prabu Subramani <suganath-prabu.subramani@xxxxxxxxxxxx> wrote: > > Hi Sundar, > > Thanks for the logs, > From log, I could see that the HBA queue depth is very high "32455" as > shown below. > [ 11.465416] mpt2sas_cm0: hba queue depth(32455), max chains per io(128). > > In this patch "https://patchwork.kernel.org/patch/11505139/" driver is > allocating the > DMA-able memory for RDPQ's in sets of 16 reply queues using limitation > of Ventura > series controller. > > With 32455 queue depth and above patch, Driver may request a large DMA-able > memory where the kernel may fail to allocate. > > To confirm this, Please try by tuning the queue depth to 8000/10000 using the > module parameter "mpt3sas.max_queue_depth=10000". > > Thanks, > Suganath > > > On Wed, Sep 30, 2020 at 7:22 PM Suganath Prabu Subramani > <suganath-prabu.subramani@xxxxxxxxxxxx> wrote: > > > > Hi Sundar, > > > > Thanks for the logs, > > From log, i could see that HBA queue depth is very high "32455" as shown below. > > [ 11.465416] mpt2sas_cm0: hba queue depth(32455), max chains per io(128). > > > > In this patch "https://patchwork.kernel.org/patch/11505139/" driver is allocating the > > DMA-able memory for RDPQ's in sets of 16 reply queues using limitation of Ventura > > series controller. > > > > With 32455 queue depth and above patch driver may request a large DMA-able > > memory where kernel may fail to allocate. > > > > To confirm this, Please try by tuning the queue depth to 8000/10000 using the > > module parameter "mpt3sas.max_queue_depth=10000". > > > > Thanks, > > Suganath > > > > On Wed, Sep 30, 2020 at 1:34 AM Sundar Nagarajan <sun.nagarajan@xxxxxxxxx> wrote: > >> > >> Thanks for your suggestions. > >> > >> I downloaded and used stock kernel 5.8.12 from kernel.org. > >> The two patches you pointed at are already applied in 5.8.12 (as you > >> had indicated). > >> > >> The problem still exists. > >> EDITED dmesg below, full dmesg output attached > >> I have also updated my kernel bugzilla report: > >> https://bugzilla.kernel.org/show_bug.cgi?id=209177 > >> > >> > >> [ 10.110816] mpt2sas_cm0: mpt3sas_base_attach > >> [ 10.110913] dca service started, version 1.12.1 > >> [ 10.122668] mpt2sas_cm0: mpt3sas_base_map_resources > >> [ 10.140735] usb 2-1.7: New USB device found, idVendor=1546, > >> idProduct=01a6, bcdDevice= 7.03 > >> [ 10.147693] scsi host2: ahci > >> [ 10.163432] usb 2-1.7: New USB device strings: Mfr=1, Product=2, > >> SerialNumber=0 > >> [ 10.173819] mpt2sas_cm0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, > >> total mem (197972228 kB) > >> [ 10.189366] usb 2-1.7: Product: u-blox 6 - GPS Receiver > >> [ 10.206466] mpt2sas_cm0: _base_get_ioc_facts > >> [ 10.219986] usb 2-1.7: Manufacturer: u-blox AG - www.u-blox.com > >> [ 10.246805] mpt2sas_cm0: _base_wait_for_iocstate > >> [ 10.260177] scsi host3: ahci > >> [ 10.271074] scsi host4: ahci > >> [ 10.281958] scsi host5: ahci > >> [ 10.292565] scsi host6: ahci > >> [ 10.299138] usb 2-1.8: new full-speed USB device number 6 using ehci-pci > >> [ 10.303153] scsi host7: ahci > >> [ 10.328158] ata1: SATA max UDMA/133 abar m2048@0xd1700000 port > >> 0xd1700100 irq 53 > >> [ 10.343989] ata2: SATA max UDMA/133 abar m2048@0xd1700000 port > >> 0xd1700180 irq 53 > >> [ 10.359546] ata3: SATA max UDMA/133 abar m2048@0xd1700000 port > >> 0xd1700200 irq 53 > >> [ 10.374807] ata4: SATA max UDMA/133 abar m2048@0xd1700000 port > >> 0xd1700280 irq 53 > >> [ 10.389813] ata5: SATA max UDMA/133 abar m2048@0xd1700000 port > >> 0xd1700300 irq 53 > >> [ 10.404635] ata6: SATA max UDMA/133 abar m2048@0xd1700000 port > >> 0xd1700380 irq 53 > >> [ 10.412371] scsi 0:0:0:0: Direct-Access SanDisk Ultra Fit > >> 1.00 PQ: 0 ANSI: 6 > >> [ 10.433718] usb 2-1.8: New USB device found, idVendor=051d, > >> idProduct=0003, bcdDevice= 1.06 > >> [ 10.435546] sd 0:0:0:0: Attached scsi generic sg0 type 0 > >> [ 10.450887] usb 2-1.8: New USB device strings: Mfr=1, Product=2, > >> SerialNumber=3 > >> [ 10.464152] offset:data > >> [ 10.478544] usb 2-1.8: Product: Smart-UPS 2200 FW:UPS 06.3 / MCU 11.0 > >> [ 10.488004] mpt2sas_cm0: [0x00]:03100200 > >> [ 10.488004] mpt2sas_cm0: [0x04]:00002300 > >> [ 10.488005] mpt2sas_cm0: [0x08]:00000000 > >> [ 10.488005] mpt2sas_cm0: [0x0c]:00000000 > >> [ 10.488006] mpt2sas_cm0: [0x10]:00000000 > >> [ 10.488007] mpt2sas_cm0: [0x14]:00010080 > >> [ 10.488007] mpt2sas_cm0: [0x18]:22137ec7 > >> [ 10.488008] mpt2sas_cm0: [0x1c]:0001285c > >> [ 10.488017] mpt2sas_cm0: [0x20]:14000600 > >> [ 10.501945] usb 2-1.8: Manufacturer: American Power Conversion > >> [ 10.501961] usb 2-1.8: SerialNumber: JS1051006712 > >> [ 10.513140] mpt2sas_cm0: [0x24]:00000020 > >> [ 10.513140] mpt2sas_cm0: [0x28]:04000020 > >> [ 10.513141] mpt2sas_cm0: [0x2c]:00810080 > >> [ 10.513141] mpt2sas_cm0: [0x30]:007f0003 > >> [ 10.513142] mpt2sas_cm0: [0x34]:0020ffe0 > >> [ 10.513154] mpt2sas_cm0: [0x38]:008004b0 > >> [ 10.513154] mpt2sas_cm0: [0x3c]:00000011 > >> [ 10.513155] mpt2sas_cm0: [0x40]:00000000 > >> [ 10.513156] mpt2sas_cm0: CurrentHostPageSize is 0: Setting default > >> host page size to 4k > >> [ 10.524350] sd 0:0:0:0: [sda] 30031250 512-byte logical blocks: > >> (15.4 GB/14.3 GiB) > >> [ 10.535178] mpt2sas_cm0: CurrentHostPageSize(0) > >> [ 10.548205] sd 0:0:0:0: [sda] Write Protect is off > >> [ 10.556610] mpt2sas_cm0: hba queue depth(32455), max chains per io(128) > >> [ 10.566972] sd 0:0:0:0: [sda] Mode Sense: 43 00 00 00 > >> [ 10.577132] mpt2sas_cm0: request frame size(128), reply frame size(128) > >> [ 10.589074] sd 0:0:0:0: [sda] Write cache: disabled, read cache: > >> enabled, doesn't support DPO or FUA > >> [ 10.597175] mpt2sas_cm0: msix is supported, vector_count(1) > >> [ 10.692084] hid: raw HID events driver (C) Jiri Kosina > >> [ 10.692148] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.6.0-k > >> [ 10.692149] igb: Copyright (c) 2007-2014 Intel Corporation. > >> [ 10.705215] mpt2sas_cm0: MSI-X vectors supported: 1 > >> [ 10.705216] no of cores: 32, max_msix_vectors: -1 > >> [ 10.705217] mpt2sas_cm0: 0 1 > >> [ 10.705359] mpt2sas_cm0: High IOPs queues : disabled > >> [ 10.757534] ata4: SATA link down (SStatus 0 SControl 300) > >> [ 10.761609] mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 56 > >> [ 10.761611] mpt2sas_cm0: iomem(0x00000000d1380000), > >> mapped(0x(____ptrval____)), size(16384) > >> [ 10.761613] mpt2sas_cm0: ioport(0x0000000000002000), size(256) > >> [ 10.781648] ata1: SATA link down (SStatus 0 SControl 300) > >> [ 10.793026] mpt2sas_cm0: _base_get_ioc_facts > >> [ 10.804281] ata6: SATA link down (SStatus 0 SControl 300) > >> [ 10.817492] mpt2sas_cm0: _base_wait_for_iocstate > >> [ 10.821742] usbcore: registered new interface driver usbhid > >> [ 10.821743] usbhid: USB HID core driver > >> [ 10.829361] ata3: SATA link down (SStatus 0 SControl 300) > >> [ 10.906674] offset:data > >> [ 10.917639] ata5: SATA link down (SStatus 0 SControl 300) > >> [ 10.917791] input: American Megatrends Inc. Virtual Keyboard and > >> Mouse as /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.4/2-1.4:1.0/0003:046B:FF10.0001/input/input2 > >> [ 10.917893] hid-generic 0003:046B:FF10.0001: input,hidraw0: USB HID > >> v1.10 Keyboard [American Megatrends Inc. Virtual Keyboard and Mouse] > >> on usb-0000:00:1d.0-1.4/input0 > >> [ 10.918019] input: American Megatrends Inc. Virtual Keyboard and > >> Mouse as /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.4/2-1.4:1.1/0003:046B:FF10.0002/input/input3 > >> [ 10.918245] hid-generic 0003:046B:FF10.0002: input,hidraw1: USB HID > >> v1.10 Mouse [American Megatrends Inc. Virtual Keyboard and Mouse] on > >> usb-0000:00:1d.0-1.4/input1 > >> [ 10.918692] hid-generic 0003:051D:0003.0003: hiddev0,hidraw2: USB > >> HID v1.00 Device [American Power Conversion Smart-UPS 2200 FW:UPS 06.3 > >> / MCU 11.0] on usb-0000:00:1d.0-1.8/input0 > >> [ 10.925117] random: fast init done > >> [ 10.929067] mpt2sas_cm0: [0x00]:03100200 > >> [ 10.939600] ata2: SATA link down (SStatus 0 SControl 300) > >> [ 10.951294] mpt2sas_cm0: [0x04]:00002300 > >> [ 10.984639] sda: sda1 sda2 sda3 > >> [ 10.985180] mpt2sas_cm0: [0x08]:00000000 > >> [ 11.005873] sd 0:0:0:0: [sda] Attached SCSI removable disk > >> [ 11.006343] mpt2sas_cm0: [0x0c]:00000000 > >> [ 11.285853] mpt2sas_cm0: [0x10]:00000000 > >> [ 11.298311] mpt2sas_cm0: [0x14]:00010080 > >> [ 11.310617] mpt2sas_cm0: [0x18]:22137ec7 > >> [ 11.322831] mpt2sas_cm0: [0x1c]:0001285c > >> [ 11.334964] mpt2sas_cm0: [0x20]:14000600 > >> [ 11.347072] mpt2sas_cm0: [0x24]:00000020 > >> [ 11.359060] mpt2sas_cm0: [0x28]:04000020 > >> [ 11.370880] mpt2sas_cm0: [0x2c]:00810080 > >> [ 11.382482] mpt2sas_cm0: [0x30]:007f0003 > >> [ 11.393927] mpt2sas_cm0: [0x34]:0020ffe0 > >> [ 11.405226] mpt2sas_cm0: [0x38]:008004b0 > >> [ 11.416400] mpt2sas_cm0: [0x3c]:00000011 > >> [ 11.427427] mpt2sas_cm0: [0x40]:00000000 > >> [ 11.438335] mpt2sas_cm0: CurrentHostPageSize is 0: Setting default > >> host page size to 4k > >> [ 11.453888] mpt2sas_cm0: CurrentHostPageSize(0) > >> [ 11.465416] mpt2sas_cm0: hba queue depth(32455), max chains per io(128) > >> [ 11.479358] mpt2sas_cm0: request frame size(128), reply frame size(128) > >> [ 11.493291] mpt2sas_cm0: _base_make_ioc_ready > >> [ 11.507135] mpt2sas_cm0: _base_get_port_facts > >> [ 11.519349] igb 0000:07:00.0: added PHC on eth0 > >> [ 11.530468] igb 0000:07:00.0: Intel(R) Gigabit Ethernet Network Connection > >> [ 11.544129] igb 0000:07:00.0: eth0: (PCIe:5.0Gb/s:Width x4) 00:1e:67:97:4d:e9 > >> [ 11.558034] igb 0000:07:00.0: eth0: PBA No: 100000-000 > >> [ 11.569355] igb 0000:07:00.0: Using MSI-X interrupts. 8 rx > >> queue(s), 8 tx queue(s) > >> [ 11.616691] offset:data > >> [ 11.624765] mpt2sas_cm0: [0x00]:05070000 > >> [ 11.634321] mpt2sas_cm0: [0x04]:00000000 > >> [ 11.643579] mpt2sas_cm0: [0x08]:00000000 > >> [ 11.652537] mpt2sas_cm0: [0x0c]:00000000 > >> [ 11.661248] mpt2sas_cm0: [0x10]:00000000 > >> [ 11.669892] mpt2sas_cm0: [0x14]:00003000 > >> [ 11.678382] mpt2sas_cm0: [0x18]:00000100 > >> [ 11.686741] mpt2sas_cm0: _base_allocate_memory_pools > >> [ 11.696171] mpt2sas_cm0: scatter gather: sge_in_main_msg(1), > >> sge_per_chain(9), sge_per_io(128), chains_per_io(15) > >> [ 11.715890] ------------[ cut here ]------------ > >> [ 11.725227] WARNING: CPU: 0 PID: 5 at mm/page_alloc.c:4831 > >> __alloc_pages_nodemask+0x1ce/0x310 > >> [ 11.739330] Modules linked in: fjes(-) hid_generic usbhid hid > >> crct10dif_pclmul igb(+) crc32_pclmul ghash_clmulni_intel dca > >> aesni_intel ptp ahci crypto_simd mpt3sas(+) pps_core xhci_pci cryptd > >> mlx4_core(+) raid_class i2c_algo_bit libahci xhci_pci_renesas > >> glue_helper scsi_transport_sas wmi uas usb_storage deflate > >> [ 11.791023] CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.8.12 #1 > >> [ 11.803622] Hardware name: ZTSYSTEM CYPRESS11 /S2600CP , > >> BIOS SE5C600.86B.02.06.0006.032420170950 03/24/2017 > >> [ 11.827610] Workqueue: events work_for_cpu_fn > >> [ 11.838884] RIP: 0010:__alloc_pages_nodemask+0x1ce/0x310 > >> [ 11.851367] Code: ff ff ff 65 48 8b 04 25 c0 7b 01 00 48 05 78 08 > >> 00 00 41 bd 01 00 00 00 48 89 44 24 08 e9 05 ff ff ff 81 e7 00 20 00 > >> 00 75 02 <0f> 0b 45 31 ed eb 95 44 8b 64 24 18 65 8b 05 1f a6 7a 4b 89 > >> c0 48 > >> [ 11.893686] RSP: 0018:ffffc18e000bbc98 EFLAGS: 00010246 > >> [ 11.906822] RAX: 0000000000000000 RBX: 0000000000000cc0 RCX: 0000000000000000 > >> [ 11.922228] RDX: 0000000000000000 RSI: 000000000000000b RDI: 0000000000000000 > >> [ 11.937510] RBP: 000000000075d000 R08: 000000000075d000 R09: ffffffffffffffff > >> [ 11.952755] R10: 0000000000000000 R11: ffff9e6a16c22350 R12: ffffffffffffffff > >> [ 11.967942] R13: 0000000000000000 R14: ffff9e5215c34f58 R15: ffff9e52163590b0 > >> [ 11.983165] FS: 0000000000000000(0000) GS:ffff9e521ea00000(0000) > >> knlGS:0000000000000000 > >> [ 11.999566] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >> [ 12.013320] CR2: 000055c7853e9ef0 CR3: 00000003d620a003 CR4: 00000000000606f0 > >> [ 12.028719] Call Trace: > >> [ 12.038777] dma_direct_alloc_pages+0x171/0x2a0 > >> [ 12.051185] dma_pool_alloc+0xd0/0x1c0 > >> [ 12.062585] base_alloc_rdpq_dma_pool+0x118/0x1d0 [mpt3sas] > >> [ 12.076131] _base_allocate_memory_pools+0x2d6/0x1240 [mpt3sas] > >> [ 12.090232] mpt3sas_base_attach+0x4a4/0x930 [mpt3sas] > >> [ 12.103599] _scsih_probe+0x4e3/0x920 [mpt3sas] > >> [ 12.116383] local_pci_probe+0x42/0x90 > >> [ 12.128401] work_for_cpu_fn+0x16/0x20 > >> [ 12.140466] process_one_work+0x208/0x400 > >> [ 12.152910] worker_thread+0x221/0x3e0 > >> [ 12.165053] ? process_one_work+0x400/0x400 > >> [ 12.177573] kthread+0x117/0x130 > >> [ 12.188759] ? kthread_park+0x90/0x90 > >> [ 12.200400] ret_from_fork+0x22/0x30 > >> [ 12.211748] ---[ end trace 1d2f9a5394100a7e ]--- > >> [ 12.224134] mpt2sas_cm0: mpt3sas_base_free_resources > >> [ 12.237582] mpt2sas_cm0: _base_make_ioc_ready > >> [ 12.249253] mpt2sas_cm0: mpt3sas_base_unmap_resources > >> [ 12.264417] igb 0000:07:00.1: added PHC on eth1 > >> [ 12.276024] igb 0000:07:00.1: Intel(R) Gigabit Ethernet Network Connection > >> [ 12.290184] igb 0000:07:00.1: eth1: (PCIe:5.0Gb/s:Width x4) 00:1e:67:97:4d:ea > >> [ 12.304604] igb 0000:07:00.1: eth1: PBA No: 100000-000 > >> [ 12.316624] igb 0000:07:00.1: Using MSI-X interrupts. 8 rx > >> queue(s), 8 tx queue(s) > >> [ 12.331505] mpt2sas_cm0: _base_release_memory_pools > >> [ 12.343209] mpt2sas_cm0: failure at > >> drivers/scsi/mpt3sas/mpt3sas_scsih.c:10791/_scsih_probe()! > >> > >> On Tue, Sep 29, 2020 at 8:00 AM Suganath Prabu Subramani > >> <suganath-prabu.subramani@xxxxxxxxxxxx> wrote: > >> > > >> > Hi Sundar, > >> > > >> > Please check if below two patches are available in the mpt3sas driver > >> > you are using. > >> > If you are seeing issues with these patches applied (Or) If your > >> > driver is already having mentioned patches, provide us driver log with > >> > "mpt3sas.logging_level=0x3f8”. > >> > > >> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/scsi/mpt3sas?h=v5.9-rc4&id=61e6ba03ea26f0205e535862009ff6ffdbf4de0c > >> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/scsi/mpt3sas?h=v5.9-rc4&id=f56577e8c7d0f3054f97d1f0d1cbe9a4d179cc47 > >> > > >> > I could see these patches in 5.8.12 > >> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/scsi/mpt3sas/mpt3sas_base.c?h=v5.8.12. > >> > > >> > Thanks, > >> > Suganath > >> > > >> > > >> > On Tue, Sep 29, 2020 at 4:18 PM Sundar Nagarajan > >> > <sun.nagarajan@xxxxxxxxx> wrote: > >> > > > >> > > Sorry if I am mailing too many people. > >> > > Copying additional people in the hope that someone has the time to guide me on how to report, debug and fix this bug in the 5.8 kernel. > >> > > > >> > > bugzilla.kernel org bug report: > >> > > https://bugzilla.kernel.org/show_bug.cgi?id=209177 > >> > > > >> > > > >> > > > >> > > > >> > > On Tue, Sep 22, 2020 at 7:08 PM Sundar Nagarajan <sun.nagarajan@xxxxxxxxx> wrote: > >> > >> > >> > >> Any guidance on how I should go about trying with the 35.100.00.00 driver? > >> > >> In particular: > >> > >> > >> > >> Which patch do I apply? > >> > >> Which kernel version do I apply the patch to? > >> > >> > >> > >> Regards, > >> > >> Sundar > >> > >> > >> > >> > >> > >> On Thu, Sep 10, 2020 at 10:51 PM Sundar Nagarajan <sun.nagarajan@xxxxxxxxx> wrote: > >> > >>> > >> > >>> Hi Suganath, > >> > >>> > >> > >>> Thank you for the quick reply. > >> > >>> > >> > >>> I am a bit of a newbie in pllying linux kernel patches etc. > >> > >>> > >> > >>> Would I apply this patch to the stock (5.8.8) kernel.org kernel: > >> > >>> https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git/commit/?h=5.10/scsi-queue > >> > >>> > >> > >>> Sundar > >> > >>> > >> > >>> > >> > >>> > >> > >>> On Thu, Sep 10, 2020 at 10:46 PM Suganath Prabu Subramani <suganath-prabu.subramani@xxxxxxxxxxxx> wrote: > >> > >>>> > >> > >>>> Hi Sundar, > >> > >>>> > >> > >>>> Can you please try with the latest driver 35.100.00.00. => "https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git/tree/?h=5.10/scsi-queue" > >> > >>>> This has fixes related to "RDPQ" scsi: mpt3sas: Fix reply queue count in non RDPQ mode. > >> > >>>> scsi: mpt3sas: Fix memset() in non-RDPQ mode. > >> > >>>> > >> > >>>> Thanks, > >> > >>>> Suganath > >> > >>>> > >> > >>>> On Fri, Sep 11, 2020 at 10:00 AM Sundar Nagarajan <sun.nagarajan@xxxxxxxxx> wrote: > >> > >>>>> > >> > >>>>> I am new to reporting linux kernel bugs. > >> > >>>>> Apologies if this is sent to you in error. > >> > >>>>> I got your email using: `perl scripts/get_maintainer.pl -f > >> > >>>>> drivers/scsi/mpt3sas/mpt3sas_scsih.c` as indicated in > >> > >>>>> https://www.kernel.org/doc/html/latest/admin-guide/reporting-bugs.html > >> > >>>>> > >> > >>>>> bugzilla.kernel org bug report: > >> > >>>>> https://bugzilla.kernel.org/show_bug.cgi?id=209177
Attachment:
dmesg.mpt3sas.max_queue_depth.20200930
Description: Binary data