On 2025-03-13 1:06 pm, Robin Murphy wrote:
On 2025-03-13 12:23 pm, Marek Szyprowski wrote:
On 13.03.2025 12:01, Robin Murphy wrote:
On 2025-03-13 9:56 am, Marek Szyprowski wrote:
[...]
This patch landed in yesterday's linux-next as commit bcb81ac6ae3c
("iommu: Get DT/ACPI parsing into the proper probe path"). In my
tests I
found it breaks booting of ARM64 RK3568-based Odroid-M1 board
(arch/arm64/boot/dts/rockchip/rk3568-odroid-m1.dts). Here is the
relevant kernel log:
...and the bug-flushing-out begins!
Unable to handle kernel NULL pointer dereference at virtual address
00000000000003e8
Mem abort info:
ESR = 0x0000000096000004
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x04: level 0 translation fault
Data abort info:
ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
CM = 0, WnR = 0, TnD = 0, TagAccess = 0
GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[00000000000003e8] user address but active_mm is swapper
Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
Modules linked in:
CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc3+ #15533
Hardware name: Hardkernel ODROID-M1 (DT)
pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : devm_kmalloc+0x2c/0x114
lr : rk_iommu_of_xlate+0x30/0x90
...
Call trace:
devm_kmalloc+0x2c/0x114 (P)
rk_iommu_of_xlate+0x30/0x90
Yeah, looks like this is doing something a bit questionable which can't
work properly. TBH the whole dma_dev thing could probably be cleaned up
now that we have proper instances, but for now does this work?
Yes, this patch fixes the problem I've observed.
Reported-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
Tested-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
BTW, this dma_dev idea has been borrowed from my exynos_iommu driver and
I doubt it can be cleaned up.
On the contrary I suspect they both can - it all dates back to when we
had the single global platform bus iommu_ops and the SoC drivers were
forced to bodge their own notion of multiple instances, but with the
modern core code, ops are always called via a valid IOMMU instance or
domain, so in principle it should always be possible to get at an
appropriate IOMMU device now. IIRC it was mostly about allocating and
DMA-mapping the pagetables in domain_alloc, where the private notion of
instances didn't have enough information, but domain_alloc_paging solves
that.
Bah, in fact I think I am going to have to do that now, since although
it doesn't crash, rk_domain_alloc_paging() will also be failing for the
same reason. Time to find a PSU for the RK3399 board, I guess...
(Or maybe just move the dma_dev assignment earlier to match Exynos?)
Thanks,
Robin.