I hit on a issue with the db820c while testing the 4.15 next push on Friday and I wanted to write down what I've see so far. While running a series of tests, I was getting a AHB timeout that resulted in a reboot (which is typically a unclocked access). A bit of experimentation seemed to point the finger at pm_runtime. Our current code has a pm_runtime_get_sync() / pm_runtime_put_sync() wrapping iommu_unmap calls. This was to ensure that the iommu was clocked for during unmap so it can do a TLB invalidate. Removing these calls eliminated the problem. Next, I tried removing the code that disables the GPU clock/regulator disable from the pm_runtime callbacks. This still resulted in a AHB timeout but then I discovered that returning error from the PM runtime call *does* work (doesn't hang). I then turned on tracepoints for pm runtime and inserted a few trace_printk() calls in our code. This is the pm_runtime_put_sync() called from msm_iommu_unmap(). You will note that it is linked to the iommu driver through a device_link: drm-api-test-5032 [003] .... 142.560420: msm_iommu_unmap: msm_iommu_unmap: pm_runtime_put_sync drm-api-test-5032 [003] d..1 142.560422: rpm_idle: b00000.gpu flags-4 cnt-0 dep-0 auto-1 p-0 irq-0 child-0 drm-api-test-5032 [003] d..1 142.560423: rpm_return_int: rpm_idle+0x3c/0x380:b00000.gpu ret=0 drm-api-test-5032 [003] d..1 142.560425: rpm_suspend: b00000.gpu flags-c cnt-0 dep-0 auto-1 p-0 irq-0 child-0 drm-api-test-5032 [003] d..1 142.567942: rpm_idle: b40000.arm,smmu flags-5 cnt-0 dep-0 auto-1 p-0 irq-0 child-0 drm-api-test-5032 [003] dn.1 142.567949: rpm_return_int: rpm_idle+0x214/0x380:b40000.arm,smmu ret=0 kworker/3:0-26 [003] d..1 142.567965: rpm_idle: b40000.arm,smmu flags-2 cnt-0 dep-0 auto-1 p-0 irq-0 child-0 kworker/3:0-26 [003] d..1 142.567968: rpm_return_int: rpm_idle+0x3c/0x380:b40000.arm,smmu ret=0 kworker/3:0-26 [003] d..1 142.567969: rpm_suspend: b40000.arm,smmu flags-a cnt-0 dep-0 auto-1 p-0 irq-0 child-0 kworker/3:0-26 [003] d..1 142.567978: rpm_idle: soc flags-1 cnt-0 dep-1 auto-1 p-0 irq-0 child-7 kworker/3:0-26 [003] d..1 142.567979: rpm_return_int: rpm_idle+0x3c/0x380:soc ret=-13 kworker/3:0-26 [003] d..1 142.567980: rpm_return_int: rpm_suspend+0x158/0x670:b40000.arm,smmu ret=0 drm-api-test-5032 [003] d..1 142.567986: rpm_idle: soc flags-1 cnt-0 dep-1 auto-1 p-0 irq-0 child-6 drm-api-test-5032 [003] d..1 142.567987: rpm_return_int: rpm_idle+0x3c/0x380:soc ret=-13 drm-api-test-5032 [003] d..1 142.567988: rpm_return_int: rpm_suspend+0x158/0x670:b00000.gpu ret=0 And then the test tries to do a submission about 3 milliseconds later: drm-api-test-5032 [003] .... 142.570752: msm_gpu_submit: msm_gpu_submit: pm_runtime_get_sync drm-api-test-5032 [003] d..1 142.570760: rpm_resume: b00000.gpu flags-4 cnt-1 dep-0 auto-1 p-0 irq-0 child-0 drm-api-test-5032 [003] d..1 142.570778: rpm_resume: b40000.arm,smmu flags-4 cnt-1 dep-0 auto-1 p-0 irq-0 child-0 drm-api-test-5032 [003] d..1 142.571135: rpm_idle: b40000.arm,smmu flags-1 cnt-1 dep-0 auto-1 p-0 irq-0 child-0 drm-api-test-5032 [003] d..1 142.571138: rpm_return_int: rpm_idle+0x3c/0x380:b40000.arm,smmu ret=-11 drm-api-test-5032 [003] d..1 142.571146: rpm_return_int: rpm_resume+0x110/0x6b8:b40000.arm,smmu ret=0 drm-api-test-5032 [003] .... 142.571920: msm_gpu_pm_resume: enable drm-api-test-5032 [003] d..1 142.575587: rpm_idle: b00000.gpu flags-1 cnt-1 dep-0 auto-1 p-0 irq-0 child-0 drm-api-test-5032 [003] d..1 142.575590: rpm_return_int: rpm_idle+0x3c/0x380:b00000.gpu ret=-11 drm-api-test-5032 [003] d..1 142.575600: rpm_idle: soc flags-5 cnt-0 dep-1 auto-1 p-0 irq-0 child-8 drm-api-test-5032 [003] d..1 142.575601: rpm_return_int: rpm_idle+0x3c/0x380:soc ret=-13 drm-api-test-5032 [003] d..1 142.575602: rpm_return_int: rpm_resume+0x110/0x6b8:b00000.gpu ret=0 Immediately following resume, the code goes into the GPU hardware init which starts writing to the hardware. The kernel log from the same run as above shows that we go into HW init immediately after resume returns. [ 142.575607] START HW_INIT Rob and I ran tests that show we through several hardware writes in the init before the AHB timeout. My best guess from all of this is that the pm_runtime_put_sync() is triggering a timeout somewhere up the clock tree and that timeout is not being canceled by the subsequent pm_runtime_get_sync(). Despite the fact that the clocks are enabled by resume (and also, in this particular test run, I don't ever disable clocks) some other clock is being turned off at some point between the rpm_suspend at 142.567988 and several microseconds into the hardware init at some pointer after 142.575607. I'm going to continue investigating but this isn't my area of expertise so if any of the clock folks out there think they might know whats going on please speak up here or in IRC. Thanks, Jordan -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html