On Mon, Aug 9, 2021 at 6:50 AM Frieder Schrempf <frieder.schrempf@xxxxxxxxxx> wrote: > > On 09.08.21 13:01, Lucas Stach wrote: > > Hi Frieder, > > > > Am Donnerstag, dem 05.08.2021 um 20:56 +0200 schrieb Frieder Schrempf: > >> On 05.08.21 12:18, Frieder Schrempf wrote: > >>> On 21.07.21 22:46, Lucas Stach wrote: > >>>> Hi all, > >>>> > >>>> second revision of the GPC improvements and BLK_CTRL driver to make use > >>>> of all the power-domains on the i.MX8MM. I'm not going to repeat the full > >>>> blurb from the v1 cover letter here, but if you are not familiar with > >>>> i.MX8MM power domains, it may be worth a read. > >>>> > >>>> This 2nd revision fixes the DT bindings to be valid yaml, some small > >>>> failure path issues and most importantly the interaction with system > >>>> suspend/resume. With the previous version some of the power domains > >>>> would not come up correctly after a suspend/resume cycle. > >>>> > >>>> Updated testing git trees here, disclaimer still applies: > >>>> https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.pengutronix.de%2Fcgit%2Flst%2Flinux%2Flog%2F%3Fh%3Dimx8m-power-domains&data=04%7C01%7Cfrieder.schrempf%40kontron.de%7C189884f9332e40cd566a08d95b250a82%7C8c9d3c973fd941c8a2b1646f3942daf1%7C0%7C0%7C637641036912506485%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=OlymcyF9VOt6nsb2E%2BpFLTBnmlpOIOxwzdBbggPu%2FHo%3D&reserved=0 > >>>> https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.pengutronix.de%2Fcgit%2Flst%2Flinux%2Flog%2F%3Fh%3Dimx8m-power-domains-testing&data=04%7C01%7Cfrieder.schrempf%40kontron.de%7C189884f9332e40cd566a08d95b250a82%7C8c9d3c973fd941c8a2b1646f3942daf1%7C0%7C0%7C637641036912506485%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=XSHl3JDKPFX%2FifXK5fcMQFOXbQXuHOJaNnJ3%2BtrMErk%3D&reserved=0 > >>> > >>> I finally did some tests on my side using USB, GPU and DSI (no PCIe, VPU, CSI so far) and the results are promising. Thanks for the effort! > >>> > >>> I will try to run some more automated suspend/resume and reboot test cycles over the weekend and report the results here afterwards. > >>> > >> > >> Unfortunately I got some results sooner than I had hoped. I set up a simple loop to suspend/resume every few seconds and on the first run it took around 2-3 hours for the device to lock up on resume. On the second run it took less than half an hour. I had glmark2-es2-drm running in the background, but it looks like it crashed at some point before the lockup occurred. > >> > >> Of course this could also be unrelated and caused by some peripheral driver or something but the first suspicion is definitely the power domains. > >> > >> If you have any suggestions for which debug options to enable or where to add some printks, please let me know. If I do another run I would like to make sure that the resulting logs are helpful for debugging. > >> > >> And I would appreciate if someone else could try to reproduce this problem on his/her side. I use this simple script for testing: > >> > >> #!/bin/sh > >> > >> glmark2-es2-drm & > >> > >> while true; > >> do > >> echo +10 > /sys/class/rtc/rtc0/wakealarm > >> echo mem > /sys/power/state > >> sleep 5 > >> done; > > > > Hm, that's unfortunate. > > > > I'm back from a two week vacation, but it looks like I won't have much > > time available to look into this issue soon. It would be very helpful > > if you could try to pinpoint the hang a bit more. If you can reproduce > > the hang with no_console_suspend you might be able to extract a bit > > more info in which stage the hang happens (suspend, resume, TF-A, etc.) > > If the hang is in the kernel you might be able to add some prints to > > the suspend/resume paths to be able to track down the exact point of > > the hang. > > > > I'm happy to look into the issue once it's better known where to look, > > but I fear that I won't have time to do the above investigation myself > > short term. Frieder, is this something you could help with over the > > next few days? > > I will see if I can find some time to track down the issue at least a little bit more. But I imagine it could get quite tedious if it takes up to several hours to reproduce the issue and I don't have much time to spare. > > @Peng, @Adam and everyone else: Any chance you could setup a similar test and try to reproduce this? right now i am on medical leave due to a broken wrist, and i wont be able to help until it heals. sorry adam > > On the other hand reboot cycle testing didn't show any lockup problems over more than 24 hours, so it seems like the issue is limited to resume.