On 09.08.21 13:01, Lucas Stach wrote: > Hi Frieder, > > Am Donnerstag, dem 05.08.2021 um 20:56 +0200 schrieb Frieder Schrempf: >> On 05.08.21 12:18, Frieder Schrempf wrote: >>> On 21.07.21 22:46, Lucas Stach wrote: >>>> Hi all, >>>> >>>> second revision of the GPC improvements and BLK_CTRL driver to make use >>>> of all the power-domains on the i.MX8MM. I'm not going to repeat the full >>>> blurb from the v1 cover letter here, but if you are not familiar with >>>> i.MX8MM power domains, it may be worth a read. >>>> >>>> This 2nd revision fixes the DT bindings to be valid yaml, some small >>>> failure path issues and most importantly the interaction with system >>>> suspend/resume. With the previous version some of the power domains >>>> would not come up correctly after a suspend/resume cycle. >>>> >>>> Updated testing git trees here, disclaimer still applies: >>>> https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.pengutronix.de%2Fcgit%2Flst%2Flinux%2Flog%2F%3Fh%3Dimx8m-power-domains&data=04%7C01%7Cfrieder.schrempf%40kontron.de%7C189884f9332e40cd566a08d95b250a82%7C8c9d3c973fd941c8a2b1646f3942daf1%7C0%7C0%7C637641036912506485%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=OlymcyF9VOt6nsb2E%2BpFLTBnmlpOIOxwzdBbggPu%2FHo%3D&reserved=0 >>>> https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.pengutronix.de%2Fcgit%2Flst%2Flinux%2Flog%2F%3Fh%3Dimx8m-power-domains-testing&data=04%7C01%7Cfrieder.schrempf%40kontron.de%7C189884f9332e40cd566a08d95b250a82%7C8c9d3c973fd941c8a2b1646f3942daf1%7C0%7C0%7C637641036912506485%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=XSHl3JDKPFX%2FifXK5fcMQFOXbQXuHOJaNnJ3%2BtrMErk%3D&reserved=0 >>> >>> I finally did some tests on my side using USB, GPU and DSI (no PCIe, VPU, CSI so far) and the results are promising. Thanks for the effort! >>> >>> I will try to run some more automated suspend/resume and reboot test cycles over the weekend and report the results here afterwards. >>> >> >> Unfortunately I got some results sooner than I had hoped. I set up a simple loop to suspend/resume every few seconds and on the first run it took around 2-3 hours for the device to lock up on resume. On the second run it took less than half an hour. I had glmark2-es2-drm running in the background, but it looks like it crashed at some point before the lockup occurred. >> >> Of course this could also be unrelated and caused by some peripheral driver or something but the first suspicion is definitely the power domains. >> >> If you have any suggestions for which debug options to enable or where to add some printks, please let me know. If I do another run I would like to make sure that the resulting logs are helpful for debugging. >> >> And I would appreciate if someone else could try to reproduce this problem on his/her side. I use this simple script for testing: >> >> #!/bin/sh >> >> glmark2-es2-drm & >> >> while true; >> do >> echo +10 > /sys/class/rtc/rtc0/wakealarm >> echo mem > /sys/power/state >> sleep 5 >> done; > > Hm, that's unfortunate. > > I'm back from a two week vacation, but it looks like I won't have much > time available to look into this issue soon. It would be very helpful > if you could try to pinpoint the hang a bit more. If you can reproduce > the hang with no_console_suspend you might be able to extract a bit > more info in which stage the hang happens (suspend, resume, TF-A, etc.) > If the hang is in the kernel you might be able to add some prints to > the suspend/resume paths to be able to track down the exact point of > the hang. > > I'm happy to look into the issue once it's better known where to look, > but I fear that I won't have time to do the above investigation myself > short term. Frieder, is this something you could help with over the > next few days? I will see if I can find some time to track down the issue at least a little bit more. But I imagine it could get quite tedious if it takes up to several hours to reproduce the issue and I don't have much time to spare. @Peng, @Adam and everyone else: Any chance you could setup a similar test and try to reproduce this? On the other hand reboot cycle testing didn't show any lockup problems over more than 24 hours, so it seems like the issue is limited to resume.