On 2019-10-04 9:37 pm, Tim Harvey wrote:
On Fri, Oct 4, 2019 at 11:34 AM Robin Murphy <robin.murphy@xxxxxxx> wrote:
On 04/10/2019 18:13, Tim Harvey wrote:
[...]
No difference... still need 'arm-smmu.disable_bypass=n' to boot. Are
all four iommu-map props above supposed to be the same? Seems to me
they all point to the same thing which looks wrong.
Hmm... :/
Those mappings just set Stream ID == PCI RID (strictly each one should
only need to cover the bus range assigned to that bridge, but it's not
crucial) which is the same thing the driver assumes for the mmu-masters
property, so either that's wrong and never could have worked anyway -
have you tried VFIO on this platform? - or there are other devices also
mastering through the SMMU that aren't described at all. Are you able to
capture a boot log? The SMMU faults do encode information about the
offending ID, and you can typically correlate their appearance
reasonably well with endpoint drivers probing.
Robin,
VFIO is enabled in the kernel but I don't know anything about how to
test/use it:
$ grep VFIO .config
CONFIG_KVM_VFIO=y
CONFIG_VFIO_IOMMU_TYPE1=y
CONFIG_VFIO_VIRQFD=y
CONFIG_VFIO=y
# CONFIG_VFIO_NOIOMMU is not set
CONFIG_VFIO_PCI=y
CONFIG_VFIO_PCI_MMAP=y
CONFIG_VFIO_PCI_INTX=y
# CONFIG_VFIO_PLATFORM is not set
# CONFIG_VFIO_MDEV is not set
No worries - since it's a networking-focused SoC I figured there was a
chance you might be using DPDK or similar userspace drivers with the NIC
VFs, but I was just casting around for a quick and easy baseline of
whether the SMMU works at all (another way would be using Qemu to run a
VM with one or more PCI devices assigned).
I do have a boot console yet I'm not seeing any smmu faults at all.
Perhaps I've mis-diagnosed the issue completely. To be clear when I
boot with arm-smmu.disable_bypass=y the serial console appears to not
accept input in userspace and with arm-smmu.disable_bypass=n I'm fine.
I'm using a buildroot initramfs rootfs for simplicity. The system
isn't hung as I originally expected as the LED heartbeat trigger
continues blinking... I just can't get console to accept input.
Curiouser and curiouser... I'm inclined to suspect that the interrupt
configuration might also be messed up, such that the SMMU is blocking
traffic and jammed up due to pending faults, but you're not getting the
IRQ delivered to find out. Does this patch help reveal anything?
http://linux-arm.org/git?p=linux-rm.git;a=commitdiff;h=29ac3648b580920692c9b417b2fc606995826517
(untested, but it's a direct port of the one I've used for SMMUv3 to
diagnose something similar)
This shows:
Yay (ish)!
[ and the tangential challenge would be to find out what the real global
fault interrupt is, 'cause apparently it's not SPI 68... ]
arm-smmu 830000000000.smmu0: Unexpected global fault, this could be serious
arm-smmu 830000000000.smmu0: GFSR 0x80000002, GFSYNR0 0x00000002,
GFSYNR1 0x00000140, GFSYNR2 0x00000000
If that stream ID were a PCI RID, it would be 01:08.0
arm-smmu 830000000000.smmu0: Unexpected global fault, this could be serious
arm-smmu 830000000000.smmu0: GFSR 0x80000002, GFSYNR0 0x00000002,
GFSYNR1 0x00000010, GFSYNR2 0x00000000
And this guy would be 00:02.0
So it seems that either the stream ID mapping is non-trivial (and
"mmu-masters" couldn't have worked), or there are secret magic endpoints
writing to memory during boot. Either way it probably needs some input
from Cavium/Marvell to get straight. In the meantime, unless just
disabling and ignoring the SMMU altogether is a viable option, I guess
we have to resign to this being one of those "non-good" reasons for
needing global bypass :(
Robin.
(note to self: it would probably be useful if we dump GFAR in these logs
too. These are all writes, so it's possible they could be MSI attempts
targeting the ITS rather than DMA as such)
arm-smmu 830000000000.smmu0: Unexpected global fault, this could be serious
arm-smmu 830000000000.smmu0: GFSR 0x80000002, GFSYNR0 0x00000002,
GFSYNR1 0x00000010, GFSYNR2 0x00000000
arm-smmu 830000000000.smmu0: Unexpected global fault, this could be serious
arm-smmu 830000000000.smmu0: GFSR 0x80000002, GFSYNR0 0x00000002,
GFSYNR1 0x00000010, GFSYNR2 0x00000000
arm-smmu 830000000000.smmu0: Unexpected global fault, this could be serious
arm-smmu 830000000000.smmu0: GFSR 0x80000002, GFSYNR0 0x00000002,
GFSYNR1 0x00000010, GFSYNR2 0x00000000
arm-smmu 830000000000.smmu0: Unexpected global fault, this could be serious
arm-smmu 830000000000.smmu0: GFSR 0x80000002, GFSYNR0 0x00000002,
GFSYNR1 0x00000010, GFSYNR2 0x00000000
arm-smmu 830000000000.smmu0: Unexpected global fault, this could be serious
arm-smmu 830000000000.smmu0: GFSR 0x80000002, GFSYNR0 0x00000002,
GFSYNR1 0x00000010, GFSYNR2 0x00000000
arm-smmu 830000000000.smmu0: Unexpected global fault, this could be serious
arm-smmu 830000000000.smmu0: GFSR 0x80000002, GFSYNR0 0x00000002,
GFSYNR1 0x00000010, GFSYNR2 0x00000000
arm-smmu 830000000000.smmu0: Unexpected global fault, this could be serious
arm-smmu 830000000000.smmu0: GFSR 0x80000002, GFSYNR0 0x00000002,
GFSYNR1 0x00000010, GFSYNR2 0x00000000
arm-smmu 830000000000.smmu0: Unexpected global fault, this could be serious
arm-smmu 830000000000.smmu0: GFSR 0x80000002, GFSYNR0 0x00000002,
GFSYNR1 0x00000010, GFSYNR2 0x00000000
...
arm-smmu 830000000000.smmu0: Unexpected global fault, this could be serious
arm-smmu 830000000000.smmu0: GFSR 0x80000002, GFSYNR0 0x00000002,
GFSYNR1 0x00000010, GFSYNR2 0x00000000
^^^ these two repeat over and over
That said, it's also puzzling that no other drivers are reporting DMA
errors or timeouts either - is there any chance that some device is set
running by the firmware/bootloader and not taken over by a kernel driver?
anything is possible - I'm using the Cavium 'BDK' as boot firmware to
configure the board which sits in from of arm trusted firmare and
bootloader.
Tim