On 01.07.2021 11:11, Robin Murphy wrote: > On 2021-07-01 10:01, Will Deacon wrote: >> On Thu, Jul 01, 2021 at 10:29:29AM +0200, Marek Szyprowski wrote: >>> Hi Robin, >>> >>> On 30.06.2021 16:01, Robin Murphy wrote: >>>> On 2021-06-30 14:48, Marek Szyprowski wrote: >>>>> On 30.06.2021 14:59, Will Deacon wrote: >>>>>> On Wed, Jun 30, 2021 at 02:48:15PM +0200, Marek Szyprowski wrote: >>>>>>> On 08.06.2021 18:45, Amey Narkhede wrote: >>>>>>>> If device registration fails, remove sysfs attribute >>>>>>>> and if setting bus callbacks fails, unregister the device >>>>>>>> and cleanup the sysfs attribute. >>>>>>>> >>>>>>>> Signed-off-by: Amey Narkhede <ameynarkhede03@xxxxxxxxx> >>>>>>> This patch landed in linux-next some time ago as commit >>>>>>> 249c9dc6aa0d >>>>>>> ("iommu/arm: Cleanup resources in case of probe error path"). After >>>>>>> bisecting and some manual searching I finally found that it is >>>>>>> responsible for breaking s2idle on DragonBoard 410c. Here is the >>>>>>> log >>>>>>> (captured with no_console_suspend): >>>>>>> >>>>>>> # time rtcwake -s10 -mmem >>>>>>> rtcwake: wakeup from "mem" using /dev/rtc0 at Thu Jan� 1 >>>>>>> 00:02:13 1970 >>>>>>> PM: suspend entry (s2idle) >>>>>>> Filesystems sync: 0.002 seconds >>>>>>> Freezing user space processes ... (elapsed 0.006 seconds) done. >>>>>>> OOM killer disabled. >>>>>>> Freezing remaining freezable tasks ... (elapsed 0.004 seconds) >>>>>>> done. >>>>>>> Unable to handle kernel NULL pointer dereference at virtual address >>>>>>> 0000000000000070 >>>>>>> Mem abort info: >>>>>>> �� � ESR = 0x96000006 >>>>>>> �� � EC = 0x25: DABT (current EL), IL = 32 bits >>>>>>> �� � SET = 0, FnV = 0 >>>>>>> �� � EA = 0, S1PTW = 0 >>>>>>> �� � FSC = 0x06: level 2 translation fault >>>>>>> Data abort info: >>>>>>> �� � ISV = 0, ISS = 0x00000006 >>>>>>> �� � CM = 0, WnR = 0 >>>>>>> user pgtable: 4k pages, 48-bit VAs, pgdp=000000008ad08000 >>>>>>> [0000000000000070] pgd=0800000085c3c003, p4d=0800000085c3c003, >>>>>>> pud=0800000088dcf003, pmd=0000000000000000 >>>>>>> Internal error: Oops: 96000006 [#1] PREEMPT SMP >>>>>>> Modules linked in: bluetooth ecdh_generic ecc rfkill ipv6 ax88796b >>>>>>> venus_enc venus_dec videobuf2_dma_contig asix crct10dif_ce adv7511 >>>>>>> snd_soc_msm8916_analog qcom_spmi_temp_alarm rtc_pm8xxx qcom_pon >>>>>>> qcom_camss qcom_spmi_vadc videobuf2_dma_sg qcom_vadc_common msm >>>>>>> venus_core v4l2_fwnode v4l2_async snd_soc_msm8916_digital >>>>>>> videobuf2_memops snd_soc_lpass_apq8016 snd_soc_lpass_cpu >>>>>>> v4l2_mem2mem >>>>>>> snd_soc_lpass_platform snd_soc_apq8016_sbc videobuf2_v4l2 >>>>>>> snd_soc_qcom_common qcom_rng videobuf2_common i2c_qcom_cci >>>>>>> qnoc_msm8916 >>>>>>> videodev mc icc_smd_rpm mdt_loader socinfo display_connector >>>>>>> rmtfs_mem >>>>>>> CPU: 1 PID: 1522 Comm: rtcwake Not tainted 5.13.0-next-20210629 >>>>>>> #3592 >>>>>>> Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) >>>>>>> pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--) >>>>>>> pc : msm_runtime_suspend+0x1c/0x60 [msm] >>>>>>> lr : msm_pm_suspend+0x18/0x38 [msm] >>>>>>> ... >>>>>>> Call trace: >>>>>>> �� �msm_runtime_suspend+0x1c/0x60 [msm] >>>>>>> �� �msm_pm_suspend+0x18/0x38 [msm] >>>>>>> �� �dpm_run_callback+0x84/0x378 >>>>>> I wonder if we're missing a pm_runtime_disable() call on the failure >>>>>> path? >>>>>> i.e. something like the diff below... >>>>> >>>>> I've checked and it doesn't fix anything. >>>> >>>> What's happened previously? Has an IOMMU actually failed to probe, or >>>> is this a fiddly "code movement unveils latent bug elsewhere" kind of >>>> thing? There doesn't look to be much capable of going wrong in >>>> msm_runtime_suspend() itself, so is the DRM driver also in a broken >>>> half-probed state where it's left its pm_runtime_ops behind without >>>> its drvdata being valid? >>>> >>> I finally had some time to analyze this issue. It turned out that with >>> this patch, iommu fails to probe for soc:iommu@1f08000 device, while it >>> worked fine before. This happens because this patch adds a check for >>> the >>> return value of the bus_set_iommu() in >>> drivers/iommu/arm/arm-smmu/qcom_iommu.c. When I removed that check, it >>> probes successfully again. It looks that there are already iommu ops >>> registered for platform bus, before qcom_iommu probes. On the other >>> hand, if I remember correctly they are not used during the device >>> registration, but they are needed for some legacy stuff. I can send a >>> patch restoring old code flow if you think that this is a right >>> solution. >> >> Yes, let's just revert the qcom_iommu.c changes from that patch for now. >> The pm runtime stuff looks dodgy anyway so I think this needs more >> thought. > > Oh, right, blindly returning the -EBUSY from bus_set_iommu() because > we're not the first instance to probe is definitely the wrong thing to > do as well. It's still not clear why failing makes the DRM driver fall > over, but +1 to qcom-iommu needing some deeper consideration. I've just checked and bus_set_iommu() is called for every 'qcom,msm-iommu-v1' device in the system, thus it fails for the second and next devices. Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland