On 23/07/2019 22:39, Jon Hunter wrote:
On 23/07/2019 14:19, Robin Murphy wrote:
...
Do you know if the SMMU interrupts are working correctly? If not, it's
possible that an incorrect address or mapping direction could lead to
the DMA transaction just being silently terminated without any fault
indication, which generally presents as inexplicable weirdness (I've
certainly seen that on another platform with the mix of an unsupported
interrupt controller and an 'imperfect' ethernet driver).
If I simply remove the iommu node for the ethernet controller, then I
see lots of ...
[ 6.296121] arm-smmu 12000000.iommu: Unexpected global fault, this
could be serious
[ 6.296125] arm-smmu 12000000.iommu: GFSR 0x00000002,
GFSYNR0 0x00000000, GFSYNR1 0x00000014, GFSYNR2 0x00000000
So I assume that this is triggering the SMMU interrupt correctly.
According to tegra186.dtsi it appears you're using the MMU-500 combined
interrupt, so if global faults are being delivered then context faults
*should* also, but I'd be inclined to try a quick hack of the relevant
stmmac_desc_ops::set_addr callback to write some bogus unmapped address
just to make sure arm_smmu_context_fault() then screams as expected, and
we're not missing anything else.
I hacked the driver and forced the address to zero for a test and
in doing so I see ...
[ 10.440072] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x402, iova=0x00000000, fsynr=0x1c0011, cbfrsynra=0x14, cb=0
So looks like the interrupts are working AFAICT.
OK, that's good, thanks for confirming. Unfortunately that now leaves us
with the challenge of figuring out how things are managing to go wrong
*without* ever faulting... :)
I wonder if we can provoke the failure on non-IOMMU platforms with
"swiotlb=force" - I have a few boxes I could potentially test that on,
but sadly forgot my plan to bring one with me this morning.
Robin.