On Wed, Mar 13, 2024 at 02:07:13AM +1100, Imran Khan wrote: > [ Upstream commit 9fb9eb4b59acc607e978288c96ac7efa917153d4 ] No it is not. > > systems, using igb driver, crash while executing poweroff command > as per following call stack: > > crash> bt -a > PID: 62583 TASK: ffff97ebbf28dc40 CPU: 0 COMMAND: "poweroff" > #0 [ffffa7adcd64f8a0] machine_kexec at ffffffffa606c7c1 > #1 [ffffa7adcd64f900] __crash_kexec at ffffffffa613bb52 > #2 [ffffa7adcd64f9d0] panic at ffffffffa6099c45 > #3 [ffffa7adcd64fa50] oops_end at ffffffffa603359a > #4 [ffffa7adcd64fa78] die at ffffffffa6033c32 > #5 [ffffa7adcd64faa8] do_trap at ffffffffa60309a0 > #6 [ffffa7adcd64faf8] do_error_trap at ffffffffa60311e7 > #7 [ffffa7adcd64fbc0] do_invalid_op at ffffffffa6031320 > #8 [ffffa7adcd64fbd0] invalid_op at ffffffffa6a01f2a > [exception RIP: free_msi_irqs+408] > RIP: ffffffffa645d248 RSP: ffffa7adcd64fc88 RFLAGS: 00010286 > RAX: ffff97eb1396fe00 RBX: 0000000000000000 RCX: ffff97eb1396fe00 > RDX: ffff97eb1396fe00 RSI: 0000000000000000 RDI: 0000000000000000 > RBP: ffffa7adcd64fcb0 R8: 0000000000000002 R9: 000000000000fbff > R10: 0000000000000000 R11: 0000000000000000 R12: ffff98c047af4720 > R13: ffff97eb87cd32a0 R14: ffff97eb87cd3000 R15: ffffa7adcd64fd57 > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > #9 [ffffa7adcd64fc80] free_msi_irqs at ffffffffa645d0fc > #10 [ffffa7adcd64fcb8] pci_disable_msix at ffffffffa645d896 > #11 [ffffa7adcd64fce0] igb_reset_interrupt_capability at ffffffffc024f335 [igb] > #12 [ffffa7adcd64fd08] __igb_shutdown at ffffffffc0258ed7 [igb] > #13 [ffffa7adcd64fd48] igb_shutdown at ffffffffc025908b [igb] > #14 [ffffa7adcd64fd70] pci_device_shutdown at ffffffffa6441e3a > #15 [ffffa7adcd64fd98] device_shutdown at ffffffffa6570260 > #16 [ffffa7adcd64fdc8] kernel_power_off at ffffffffa60c0725 > #17 [ffffa7adcd64fdd8] SYSC_reboot at ffffffffa60c08f1 > #18 [ffffa7adcd64ff18] sys_reboot at ffffffffa60c09ee > #19 [ffffa7adcd64ff28] do_syscall_64 at ffffffffa6003ca9 > #20 [ffffa7adcd64ff50] entry_SYSCALL_64_after_hwframe at ffffffffa6a001b1 > > This happens because igb_shutdown has not yet freed up allocated irqs and > free_msi_irqs finds irq_has_action true for involved msi irqs here and this > condition triggers BUG_ON. > > Freeing irqs before proceeding further in igb_clear_interrupt_scheme, > fixes this problem. > > Signed-off-by: Imran Khan <imran.f.khan@xxxxxxxxxx> > --- > > This issue does not happen in v5.17 or later kernel versions because > 'commit 9fb9eb4b59ac ("PCI/MSI: Let core code free MSI descriptors")', > explicitly frees up MSI based irqs and hence indirectly fixes this issue > as well. Also this is why I have mentioned this commit as equivalent > upstream commit. But this upstream change itself is dependent on a bunch > of changes starting from 'commit 288c81ce4be7 ("PCI/MSI: Move code into a > separate directory")', which refactored msi driver into multiple parts. > So another way of fixing this issue would be to backport these patches and > get this issue implictly fixed. > Kindly let me know if my current patch is not acceptable and in that case > will it be fine if I backport the above mentioned msi driver refactoring > patches to LST. What would the real patch series look like? How bad is the backports? Try that out first please. thanks, greg k-h