On Tue, Jan 07, 2025 at 04:24:59PM -0500, Parker Newman wrote: > From: Parker Newman <pnewman@xxxxxxxxxxxxxxx> > > Nvidia's Tegra MGBE controllers require the IOMMU "Stream ID" (SID) to be > written to the MGBE_WRAP_AXI_ASID0_CTRL register. > > The current driver is hard coded to use MGBE0's SID for all controllers. > This causes softirq time outs and kernel panics when using controllers > other than MGBE0. > > Example dmesg errors when an ethernet cable is connected to MGBE1: > > [ 116.133290] tegra-mgbe 6910000.ethernet eth1: Link is Up - 1Gbps/Full - flow control rx/tx > [ 121.851283] tegra-mgbe 6910000.ethernet eth1: NETDEV WATCHDOG: CPU: 5: transmit queue 0 timed out 5690 ms > [ 121.851782] tegra-mgbe 6910000.ethernet eth1: Reset adapter. > [ 121.892464] tegra-mgbe 6910000.ethernet eth1: Register MEM_TYPE_PAGE_POOL RxQ-0 > [ 121.905920] tegra-mgbe 6910000.ethernet eth1: PHY [stmmac-1:00] driver [Aquantia AQR113] (irq=171) > [ 121.907356] tegra-mgbe 6910000.ethernet eth1: Enabling Safety Features > [ 121.907578] tegra-mgbe 6910000.ethernet eth1: IEEE 1588-2008 Advanced Timestamp supported > [ 121.908399] tegra-mgbe 6910000.ethernet eth1: registered PTP clock > [ 121.908582] tegra-mgbe 6910000.ethernet eth1: configuring for phy/10gbase-r link mode > [ 125.961292] tegra-mgbe 6910000.ethernet eth1: Link is Up - 1Gbps/Full - flow control rx/tx > [ 181.921198] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > [ 181.921404] rcu: 7-....: (1 GPs behind) idle=540c/1/0x4000000000000002 softirq=1748/1749 fqs=2337 > [ 181.921684] rcu: (detected by 4, t=6002 jiffies, g=1357, q=1254 ncpus=8) > [ 181.921878] Sending NMI from CPU 4 to CPUs 7: > [ 181.921886] NMI backtrace for cpu 7 > [ 181.922131] CPU: 7 UID: 0 PID: 0 Comm: swapper/7 Kdump: loaded Not tainted 6.13.0-rc3+ #6 > [ 181.922390] Hardware name: NVIDIA CTI Forge + Orin AGX/Jetson, BIOS 202402.1-Unknown 10/28/2024 > [ 181.922658] pstate: 40400009 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > [ 181.922847] pc : handle_softirqs+0x98/0x368 > [ 181.922978] lr : __do_softirq+0x18/0x20 > [ 181.923095] sp : ffff80008003bf50 > [ 181.923189] x29: ffff80008003bf50 x28: 0000000000000008 x27: 0000000000000000 > [ 181.923379] x26: ffffce78ea277000 x25: 0000000000000000 x24: 0000001c61befda0 > [ 181.924486] x23: 0000000060400009 x22: ffffce78e99918bc x21: ffff80008018bd70 > [ 181.925568] x20: ffffce78e8bb00d8 x19: ffff80008018bc20 x18: 0000000000000000 > [ 181.926655] x17: ffff318ebe7d3000 x16: ffff800080038000 x15: 0000000000000000 > [ 181.931455] x14: ffff000080816680 x13: ffff318ebe7d3000 x12: 000000003464d91d > [ 181.938628] x11: 0000000000000040 x10: ffff000080165a70 x9 : ffffce78e8bb0160 > [ 181.945804] x8 : ffff8000827b3160 x7 : f9157b241586f343 x6 : eeb6502a01c81c74 > [ 181.953068] x5 : a4acfcdd2e8096bb x4 : ffffce78ea277340 x3 : 00000000ffffd1e1 > [ 181.960329] x2 : 0000000000000101 x1 : ffffce78ea277340 x0 : ffff318ebe7d3000 > [ 181.967591] Call trace: > [ 181.970043] handle_softirqs+0x98/0x368 (P) > [ 181.974240] __do_softirq+0x18/0x20 > [ 181.977743] ____do_softirq+0x14/0x28 > [ 181.981415] call_on_irq_stack+0x24/0x30 > [ 181.985180] do_softirq_own_stack+0x20/0x30 > [ 181.989379] __irq_exit_rcu+0x114/0x140 > [ 181.993142] irq_exit_rcu+0x14/0x28 > [ 181.996816] el1_interrupt+0x44/0xb8 > [ 182.000316] el1h_64_irq_handler+0x14/0x20 > [ 182.004343] el1h_64_irq+0x80/0x88 > [ 182.007755] cpuidle_enter_state+0xc4/0x4a8 (P) > [ 182.012305] cpuidle_enter+0x3c/0x58 > [ 182.015980] cpuidle_idle_call+0x128/0x1c0 > [ 182.020005] do_idle+0xe0/0xf0 > [ 182.023155] cpu_startup_entry+0x3c/0x48 > [ 182.026917] secondary_start_kernel+0xdc/0x120 > [ 182.031379] __secondary_switched+0x74/0x78 > [ 212.971162] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 7-.... } 6103 jiffies s: 417 root: 0x80/. > [ 212.985935] rcu: blocking rcu_node structures (internal RCU debug): > [ 212.992758] Sending NMI from CPU 0 to CPUs 7: > [ 212.998539] NMI backtrace for cpu 7 > [ 213.004304] CPU: 7 UID: 0 PID: 0 Comm: swapper/7 Kdump: loaded Not tainted 6.13.0-rc3+ #6 > [ 213.016116] Hardware name: NVIDIA CTI Forge + Orin AGX/Jetson, BIOS 202402.1-Unknown 10/28/2024 > [ 213.030817] pstate: 40400009 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > [ 213.040528] pc : handle_softirqs+0x98/0x368 > [ 213.046563] lr : __do_softirq+0x18/0x20 > [ 213.051293] sp : ffff80008003bf50 > [ 213.055839] x29: ffff80008003bf50 x28: 0000000000000008 x27: 0000000000000000 > [ 213.067304] x26: ffffce78ea277000 x25: 0000000000000000 x24: 0000001c61befda0 > [ 213.077014] x23: 0000000060400009 x22: ffffce78e99918bc x21: ffff80008018bd70 > [ 213.087339] x20: ffffce78e8bb00d8 x19: ffff80008018bc20 x18: 0000000000000000 > [ 213.097313] x17: ffff318ebe7d3000 x16: ffff800080038000 x15: 0000000000000000 > [ 213.107201] x14: ffff000080816680 x13: ffff318ebe7d3000 x12: 000000003464d91d > [ 213.116651] x11: 0000000000000040 x10: ffff000080165a70 x9 : ffffce78e8bb0160 > [ 213.127500] x8 : ffff8000827b3160 x7 : 0a37b344852820af x6 : 3f049caedd1ff608 > [ 213.138002] x5 : cff7cfdbfaf31291 x4 : ffffce78ea277340 x3 : 00000000ffffde04 > [ 213.150428] x2 : 0000000000000101 x1 : ffffce78ea277340 x0 : ffff318ebe7d3000 > [ 213.162063] Call trace: > [ 213.165494] handle_softirqs+0x98/0x368 (P) > [ 213.171256] __do_softirq+0x18/0x20 > [ 213.177291] ____do_softirq+0x14/0x28 > [ 213.182017] call_on_irq_stack+0x24/0x30 > [ 213.186565] do_softirq_own_stack+0x20/0x30 > [ 213.191815] __irq_exit_rcu+0x114/0x140 > [ 213.196891] irq_exit_rcu+0x14/0x28 > [ 213.202401] el1_interrupt+0x44/0xb8 > [ 213.207741] el1h_64_irq_handler+0x14/0x20 > [ 213.213519] el1h_64_irq+0x80/0x88 > [ 213.217541] cpuidle_enter_state+0xc4/0x4a8 (P) > [ 213.224364] cpuidle_enter+0x3c/0x58 > [ 213.228653] cpuidle_idle_call+0x128/0x1c0 > [ 213.233993] do_idle+0xe0/0xf0 > [ 213.237928] cpu_startup_entry+0x3c/0x48 > [ 213.243791] secondary_start_kernel+0xdc/0x120 > [ 213.249830] __secondary_switched+0x74/0x78 > > This bug has existed since the dwmac-tegra driver was added in Dec 2022 > (See Fixes tag below for commit hash). > > The Tegra234 SOC has 4 MGBE controllers, however Nvidia's Developer Kit > only uses MGBE0 which is why the bug was not found previously. Connect Tech > has many products that use 2 (or more) MGBE controllers. > > The solution is to read the controller's SID from the existing "iommus" > device tree property. The 2nd field of the "iommus" device tree property > is the controller's SID. > > Device tree snippet from tegra234.dtsi showing MGBE1's "iommus" property: > > smmu_niso0: iommu@12000000 { > compatible = "nvidia,tegra234-smmu", "nvidia,smmu-500"; > ... > } > > /* MGBE1 */ > ethernet@6900000 { > compatible = "nvidia,tegra234-mgbe"; > ... > iommus = <&smmu_niso0 TEGRA234_SID_MGBE_VF1>; > ... > } > > Nvidia's arm-smmu driver reads the "iommus" property and stores the SID in > the MGBE device's "fwspec" struct. The dwmac-tegra driver can access the > SID using the tegra_dev_iommu_get_stream_id() helper function found in > linux/iommu.h. > > Calling tegra_dev_iommu_get_stream_id() should not fail unless the "iommus" > property is removed from the device tree or the IOMMU is disabled. > > While the Tegra234 SOC technically supports bypassing the IOMMU, it is not > supported by the current firmware, has not been tested and not recommended. > More detailed discussion with Thierry Reding from Nvidia linked below. > > Fixes: d8ca113724e7 ("net: stmmac: tegra: Add MGBE support") > Link: https://lore.kernel.org/netdev/cover.1731685185.git.pnewman@xxxxxxxxxxxxxxx > Signed-off-by: Parker Newman <pnewman@xxxxxxxxxxxxxxx> > --- > > Changes v2: > - dropped cover letter > - added more detail to commit message > - rebased to latest netdev tree > > drivers/net/ethernet/stmicro/stmmac/dwmac-tegra.c | 14 +++++++++++--- > 1 file changed, 11 insertions(+), 3 deletions(-) Acked-by: Thierry Reding <treding@xxxxxxxxxx>
Attachment:
signature.asc
Description: PGP signature