Re: [RFC] PCI/MSI: Warning observed for NVMe with ACPI

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jon,

On Fri, 10 Dec 2021 10:48:22 +0000,
Jon Hunter <jonathanh@xxxxxxxxxx> wrote:
> 
> Hi all,
> 
> Since Linux v5.13, we have noticed that following warning splat when
> booting Tegra (ARM64) with ACPI ...
> 
> [    2.725479] WARNING: CPU: 0 PID: 94 at include/linux/msi.h:264 free_msi_irqs+0x84/0x188
> [    2.736137] Modules linked in:
> [    2.736147] CPU: 0 PID: 94 Comm: kworker/u16:1 Tainted: G        W         5.12.0-rc2-00008-g658376bd3e5-dirty #36
> [    2.736160] Workqueue: nvme-reset-wq nvme_reset_work
> [    2.746470] pstate: 60400009 (nZCv daif +PAN -UAO -TCO BTYPE=--)
> [    2.757713] pc : free_msi_irqs+0x84/0x188
> [    2.757726] lr : __pci_enable_msix_range+0x380/0x530
> [    2.757735] sp : ffff800012813b00
> [    2.757739] x29: ffff800012813b00
> [    2.768371] x28: 00000000ffffffed
> [    2.768382] x27: 0000000000000001 x26: 0000000000000000
> [    2.768393] x25: ffff0000809362e8 x24: 0000000000000000
> [    2.768407] x23: 000000000000000c x22: ffff000080936000
> [    2.768418] x21: ffff0000809362e8 x20: ffff0000809362e8
> [    2.775320] x19: ffff000080936000
> [    2.785950] x18: ffffffffffffffff
> [    2.785961] x17: 0000000000000007 x16: 0000000000000001
> [    2.785975] x15: ffff800011bf9948
> [    2.793997] x14: ffff8000928137e7
> [    2.794009] x13: ffff8000128137f5 x12: ffff800011c19640
> [    2.794023] x11: fffffffffffe5788 x10: 0000000005f5e0ff
> [    2.794034] x9 : 00000000ffffffd0 x8 : 203a737542204f49
> [    2.803737] x7 : 444d206465786946 x6 : ffff800011ee1fd7
> [    2.803750] x5 : 0000000000000000 x4 : 0000000000000000
> [    2.815286] x3 : 00000000ffffffff x2 : ffff0000809362e8
> [    2.815300] x1 : ffff0000809362e8 x0 : 0000000000000000
> [    2.825270] Call trace:
> [    2.825275]  free_msi_irqs+0x84/0x188
> [    2.825288]  __pci_enable_msix_range+0x380/0x530
> [    2.825299]  pci_alloc_irq_vectors_affinity+0x158/0x168
> [    2.825309]  nvme_reset_work+0x214/0x15b8
> [    2.829340] dwc-eth-dwmac NVDA1160:00: SPH feature enabled
> [    2.832986]  process_one_work+0x1cc/0x360
> [    2.833002]  worker_thread+0x48/0x450
> [    2.833012]  kthread+0x120/0x150
> [    2.833020]  ret_from_fork+0x10/0x18
> 
> 
> Bisecting this I found that started to occur because with Linux v5.13,
> CONFIG_PCI_MSI_ARCH_FALLBACKS was no longer enabled by default and only
> happened to be enabled because Renesas R-Car was enabling it.
> 
> When booting with ACPI, I see that when pci_msi_setup_msi_irqs() is
> called, it ends up calling arch_setup_msi_irqs() and if
> CONFIG_PCI_MSI_ARCH_FALLBACKS  is not enabled, then this will call
> WARN_ON_ONCE(1).
> 
> So the question is, should this be enabled by default for ARM64? I see
> a lot of other architectures enabling this when PCI_MSI is enabled. So
> I am wondering if we should be doing something like ...
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 1f212b47a48a..4bbd81bab809 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -202,6 +202,7 @@ config ARM64
>         select PCI_DOMAINS_GENERIC if PCI
>         select PCI_ECAM if (ACPI && PCI)
>         select PCI_SYSCALL if PCI
> +       select PCI_MSI_ARCH_FALLBACKS if PCI_MSI
>         select POWER_RESET
>         select POWER_SUPPLY
>         select SPARSE_IRQ

+Thomas, as he's neck-deep in the MSI rework.

No, this definitely is the wrong solution.

arm64 doesn't need any arch fallback (I actually went out of my way to
kill them on this architecture), and requires the individual MSI
controller drivers to do the right thing by using MSI domains.  Adding
this config option makes the warning disappear, but the core issue is
that you have a device that doesn't have a MSI domain associated with
it.

So either your device isn't MSI capable (odd), your host bridge
doesn't make the link with the MSI controller to advertise the MSI
domain (this should normally be dealt with via IORT), or there is a
bug of a similar sort somewhere else.

Getting to the root of this issue would be the right thing to do.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux