On Wed, 2022-01-12 at 15:01 +0100, mwilck@xxxxxxxx wrote: > From: Martin Wilck <mwilck@xxxxxxxx> > > I observe the watchdog timer being triggered while unloading the > mpt3sas driver: > > Jan 12 12:25:51 tegmen kernel: mpt2sas_cm0: mpt3sas_base_detach > Jan 12 12:25:51 tegmen kernel: mpt2sas_cm0: > mpt3sas_base_free_resources > Jan 12 12:25:51 tegmen kernel: mpt2sas_cm0: > mpt3sas_base_make_ioc_ready > Jan 12 12:25:51 tegmen kernel: mpt2sas_cm0: sending message unit > reset !! > Jan 12 12:25:51 tegmen kernel: mpt2sas_cm0: message unit reset: > SUCCESS > Jan 12 12:25:51 tegmen kernel: mpt2sas_cm0: > mpt3sas_base_unmap_resources > Jan 12 12:25:51 tegmen kernel: mpt2sas_cm0: > _base_release_memory_pools > Jan 12 12:25:51 tegmen kernel: mpt2sas_cm0: > request_pool(0x00000000144b1531): free > Jan 12 12:25:51 tegmen kernel: mpt2sas_cm0: > sense_pool(0x000000009665c238): free > Jan 12 12:25:52 tegmen kernel: mpt2sas_cm0: > reply_pool(0x000000005c5e0fa5): free > Jan 12 12:25:52 tegmen kernel: mpt2sas_cm0: > reply_free_pool(0x000000006f897f6c): free > Jan 12 12:25:52 tegmen kernel: mpt2sas_cm0: > reply_post_free_pool(0x00000000d1edc4aa): free > Jan 12 12:25:52 tegmen kernel: mpt2sas_cm0: > config_page(0x000000009f651842): free > Jan 12 12:26:23 tegmen kernel: watchdog: BUG: soft lockup - CPU#27 > stuck for 26s! [rmmod:2594] > Jan 12 12:26:23 tegmen kernel: Hardware name: HP ProLiant DL560 Gen8, > BIOS P77 05/24/2019 > Jan 12 12:26:23 tegmen kernel: RIP: > 0010:_raw_spin_unlock_irqrestore+0x26/0x2e > Jan 12 12:26:23 tegmen kernel: Code: 1f 44 00 00 0f 1f 44 00 00 c6 07 > 00 0f 1f 40 00 f7 c6 00 02 00 00 75 0b 65 ff 0d 05 ce a1 5f 74 0> > Jan 12 12:26:23 tegmen kernel: RSP: 0018:ffffab1546bdfcc8 EFLAGS: > 00000206 > Jan 12 12:26:23 tegmen kernel: RAX: 0000000000000c80 RBX: > ffff8d82b0f16700 RCX: 0000000000000d00 > Jan 12 12:26:23 tegmen kernel: RDX: 0000000453642d00 RSI: > 0000000000000282 RDI: ffff8d8292075f90 > Jan 12 12:26:23 tegmen kernel: RBP: ffff8d8292075f80 R08: > 0000000000000000 R09: 0000000000000001 > Jan 12 12:26:23 tegmen kernel: R10: 0000000000000003 R11: > ffff8d8284256a00 R12: ffff8d8293642d00 > Jan 12 12:26:23 tegmen kernel: R13: ffff8d8292075f90 R14: > 0000000000000282 R15: 0000000000000d00 > Jan 12 12:26:23 tegmen kernel: FS: 00007fbd96388740(0000) > GS:ffff8d8e7f6c0000(0000) knlGS:0000000000000000 > Jan 12 12:26:23 tegmen kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > Jan 12 12:26:23 tegmen kernel: CR2: 000055bbd50f9918 CR3: > 0000000c80b0c001 CR4: 00000000000606e0 > Jan 12 12:26:23 tegmen kernel: Call Trace: > Jan 12 12:26:23 tegmen kernel: <TASK> > Jan 12 12:26:23 tegmen kernel: dma_pool_free+0xc1/0x100 > Jan 12 12:26:23 tegmen kernel: > _base_release_memory_pools+0x343/0x4c0 [mpt3sas > 6ff0715b1f6f07c16051cb2772836069b2821b01] > Jan 12 12:26:23 tegmen kernel: mpt3sas_base_detach+0x2e/0x130 > [mpt3sas 6ff0715b1f6f07c16051cb2772836069b2821b01] > > When the driver is unloaded during system shutdown, this may actually > cause a > kernel panic triggered by the watchdog. > > The problem is that with the hardware in question, the driver > allocates a very > large number of DMA buffers for chain lookup (scsiio_depth = 29868, > chains_needed_per_io = 15, total number of buffers = 448020). The > loop that > frees all DMA buffers takes ~30s to execute. By adding a > cond_resched() in the > loop, the watchdog is avoided. > > Note: This is the 2nd issue I saw with this controller and the > reported can_queue > value after > https://lore.kernel.org/linux-scsi/Ydug9nWg4loEVkJw@T590/T/ > > Fixes: 93204b782a88 ("scsi: mpt3sas: Lockless access for chain > buffers.") > Signed-off-by: Martin Wilck <mwilck@xxxxxxxx> > CC: Sathya Prakash <sathya.prakash@xxxxxxxxxxxx> > Cc: Sreekanth Reddy <sreekanth.reddy@xxxxxxxxxxxx> > Cc: Suganath Prabu Subramani <suganath-prabu.subramani@xxxxxxxxxxxx> > Cc: MPT-FusionLinux.pdl@xxxxxxxxxxxx > --- > drivers/scsi/mpt3sas/mpt3sas_base.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c > b/drivers/scsi/mpt3sas/mpt3sas_base.c > index 81dab9b82f79..943ea7e0fef0 100644 > --- a/drivers/scsi/mpt3sas/mpt3sas_base.c > +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c > @@ -5715,6 +5715,7 @@ _base_release_memory_pools(struct > MPT3SAS_ADAPTER *ioc) > ct- > >chain_buffer_dma); > } > kfree(ioc->chain_lookup[i].chains_per_smid); > + cond_resched(); > } > dma_pool_destroy(ioc->chain_dma_pool); > kfree(ioc->chain_lookup);