On 17.03.2025 19:22, Robin Murphy wrote: > On 17/03/2025 7:37 am, Marek Szyprowski wrote: >> On 13.03.2025 15:12, Robin Murphy wrote: >>> On 2025-03-13 1:06 pm, Robin Murphy wrote: >>>> On 2025-03-13 12:23 pm, Marek Szyprowski wrote: >>>>> On 13.03.2025 12:01, Robin Murphy wrote: >>>>>> On 2025-03-13 9:56 am, Marek Szyprowski wrote: >>>>>> [...] >>>>>>> This patch landed in yesterday's linux-next as commit bcb81ac6ae3c >>>>>>> ("iommu: Get DT/ACPI parsing into the proper probe path"). In my >>>>>>> tests I >>>>>>> found it breaks booting of ARM64 RK3568-based Odroid-M1 board >>>>>>> (arch/arm64/boot/dts/rockchip/rk3568-odroid-m1.dts). Here is the >>>>>>> relevant kernel log: >>>>>> >>>>>> ...and the bug-flushing-out begins! >>>>>> >>>>>>> Unable to handle kernel NULL pointer dereference at virtual address >>>>>>> 00000000000003e8 >>>>>>> Mem abort info: >>>>>>> ESR = 0x0000000096000004 >>>>>>> EC = 0x25: DABT (current EL), IL = 32 bits >>>>>>> SET = 0, FnV = 0 >>>>>>> EA = 0, S1PTW = 0 >>>>>>> FSC = 0x04: level 0 translation fault >>>>>>> Data abort info: >>>>>>> ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 >>>>>>> CM = 0, WnR = 0, TnD = 0, TagAccess = 0 >>>>>>> GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 >>>>>>> [00000000000003e8] user address but active_mm is swapper >>>>>>> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP >>>>>>> Modules linked in: >>>>>>> CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc3+ #15533 >>>>>>> Hardware name: Hardkernel ODROID-M1 (DT) >>>>>>> pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) >>>>>>> pc : devm_kmalloc+0x2c/0x114 >>>>>>> lr : rk_iommu_of_xlate+0x30/0x90 >>>>>>> ... >>>>>>> Call trace: >>>>>>> devm_kmalloc+0x2c/0x114 (P) >>>>>>> rk_iommu_of_xlate+0x30/0x90 >>>>>> >>>>>> Yeah, looks like this is doing something a bit questionable which >>>>>> can't >>>>>> work properly. TBH the whole dma_dev thing could probably be >>>>>> cleaned up >>>>>> now that we have proper instances, but for now does this work? >>>>> >>>>> Yes, this patch fixes the problem I've observed. >>>>> >>>>> Reported-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> >>>>> Tested-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> >>>>> >>>>> BTW, this dma_dev idea has been borrowed from my exynos_iommu driver >>>>> and >>>>> I doubt it can be cleaned up. >>>> >>>> On the contrary I suspect they both can - it all dates back to when >>>> we had the single global platform bus iommu_ops and the SoC drivers >>>> were forced to bodge their own notion of multiple instances, but with >>>> the modern core code, ops are always called via a valid IOMMU >>>> instance or domain, so in principle it should always be possible to >>>> get at an appropriate IOMMU device now. IIRC it was mostly about >>>> allocating and DMA-mapping the pagetables in domain_alloc, where the >>>> private notion of instances didn't have enough information, but >>>> domain_alloc_paging solves that. >>> >>> Bah, in fact I think I am going to have to do that now, since although >>> it doesn't crash, rk_domain_alloc_paging() will also be failing for >>> the same reason. Time to find a PSU for the RK3399 board, I guess... >>> >>> (Or maybe just move the dma_dev assignment earlier to match Exynos?) >> >> Well I just found that Exynos IOMMU is also broken on some on my test >> boards. It looks that the runtime pm links are somehow not correctly >> established. I will try to analyze this later in the afternoon. > > Hmm, I tried to get an Odroid-XU3 up and running, but it seems unable > to boot my original 6.14-rc3-based branch - even with the IOMMU driver > disabled, it's consistently dying somewhere near (or just after) init > with what looks like some catastrophic memory corruption issue - very > occasionally it's managed to print the first line of various different > panics. > > Before that point though, with the IOMMU driver enabled it does appear > to show signs of working OK: > > [ 0.649703] exynos-sysmmu 14650000.sysmmu: hardware version: 3.3 > [ 0.654220] platform 14450000.mixer: Adding to iommu group 1 > ... > [ 2.680920] exynos-mixer 14450000.mixer: > exynos_iommu_attach_device: Attached IOMMU with pgtable 0x42924000 > ... > [ 5.196674] exynos-mixer 14450000.mixer: > exynos_iommu_identity_attach: Restored IOMMU to IDENTITY from pgtable > 0x42924000 > [ 5.207091] exynos-mixer 14450000.mixer: > exynos_iommu_attach_device: Attached IOMMU with pgtable 0x42884000 > > > The multi-instance stuff in probe/release does look a bit suspect, > however - seems like the second instance probe would overwrite the > first instance's links, and then there would be a double-del() if the > device were ever actually released again? I may have made that much > more likely to happen, but I suspect it was already possible with > async driver probe... That is really strange. My Odroid XU3 boots fine from commit bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path"), although the IOMMU seems not to be working correctly. I've tested this with 14450000.mixer device (one need to attach HDMI cable to get it activated) and it looks that the video data are not being read from memory at all (the lack of VSYNC is reported, no IOMMU fault). However, from time to time, everything initializes and works properly. It looks that this is somehow related to the different IOMMU/DMA-mapping glue code, as the other boards (ARM64 based) with exactly the same Exynos IOMMU driver always work fine. I've tried to figure out what actually happens, but so far I didn't get anything for sure. Disabling the call to dev->bus->dma_configure(dev) from iommu_init_device() seems to be fixing this, but this is almost equal to the revert of the $subject patch. I don't get why calling it in iommu_init_device() causes problems. It also doesn't look that this is anyhow related to the multi-instance stuff, as the same happens if I only leave a single exynos-sysmmu instance and its client (only 14450000.mixer device in the system). Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland