Hi Martin, On 14/01/2019 22:01, Martin Sperl wrote: > Hi Jon, > > On 14.01.2019, at 16:35, Jon Hunter <jonathanh@xxxxxxxxxx > <mailto:jonathanh@xxxxxxxxxx>> wrote: > >> Hi Martin, Mark, >> >> [ 58.222033] spi_master spi1: could not stop message queue >> [ 58.222038] spi_master spi1: queue stop failed >> [ 58.222048] dpm_run_callback(): platform_pm_suspend+0x0/0x54 >> returns -16 >> [ 58.222052] PM: Device 7000da00.spi failed to suspend: error -16 >> [ 58.222057] PM: Some devices failed to suspend, or early wake event >> detected > > Unfortunately I have not been able to reproduce this in > my test cases with the hw available to me. Looking at both boards that fail, tegra30-cardhu-a04 and tegra124-jetson-tk1 they both have a spi-flash. The compatible strings for the spi flashes are "winbond,w25q32" and "winbond,w25q32dw", respectively which interestingly are not documented/used anywhere in the kernel. It appears that there was a patch to fix this a few years back but never got applied [0]. However, applying this patch does not fix the issue. Furthermore, without this patch applied I see that the spi flash is detected fine ... [ 2.540395] m25p80 spi1.0: w25q32dw (4096 Kbytes) So this is not related but the main point is occurs with a spi flash device. > Looks as if there is something missing in spi_stop_queue that > would wake the worker thread one last time without any delays > and finish the hw shutdown immediately - it runs as a delayed > task... > > One question: do you run any spi transfers in > your test case before suspend? No and before suspending I dumped some of the spi stats and I see no tranfers/messages at all ... Stats for spi1 ... Bytes: 0 Errors: 0 Messages: 0 Transfers: 0 > /sys/class/spi_master/spi1/statistics/messages gives some > counters on the number of spi messages processed which > would give you an indication if that is happening. > > It could be as easy as adding right after the first lock > in spi_stop_queue: > kthread_mod_delayed_work(&ctlr->kworker, > &ctlr->pump_idle_teardown, 0); > (plus maybe a yield or similar to allow the worker to > quickly/reliably run on a single core machine) > > I hope that this initial guess helps. Unfortunately, the above did not help and the issue persists. Digging a bit deeper I see that now the 'ctlr->queue' is empty but 'ctlr->busy' flag is set and this is causing the 'could not stop message queue' error. It seems that __spi_pump_messages() is getting called several times during boot when registering the spi-flash, then after the spi-flash has been registered, about a 1 sec later spi_pump_idle_teardown() is called (as expected), but exits because 'ctlr->running' is true. However, spi_pump_idle_teardown() is never called again and when we suspend we are stuck in the busy/running state. In this case should something be scheduling spi_pump_idle_teardown() again? Although even if it does I don't see where the busy flag would be cleared in this path? Cheers Jon [0] https://patchwork.kernel.org/patch/7021961/ -- nvpublic