On 15/01/2019 17:39, kernel@xxxxxxxxxxxxxxxx wrote: > Hi John! > >> On 15.01.2019, at 15:26, Jon Hunter <jonathanh@xxxxxxxxxx> wrote: >>> Looks as if there is something missing in spi_stop_queue that >>> would wake the worker thread one last time without any delays >>> and finish the hw shutdown immediately - it runs as a delayed >>> task... >>> >>> One question: do you run any spi transfers in >>> your test case before suspend? >> >> No and before suspending I dumped some of the spi stats and I see no >> tranfers/messages at all ... >> >> Stats for spi1 ... >> Bytes: 0 >> Errors: 0 >> Messages: 0 >> Transfers: 0 > > This... > >>> /sys/class/spi_master/spi1/statistics/messages gives some >>> counters on the number of spi messages processed which >>> would give you an indication if that is happening. >>> >>> It could be as easy as adding right after the first lock >>> in spi_stop_queue: >>> kthread_mod_delayed_work(&ctlr->kworker, >>> &ctlr->pump_idle_teardown, 0); >>> (plus maybe a yield or similar to allow the worker to >>> quickly/reliably run on a single core machine) >>> >>> I hope that this initial guess helps. >> >> Unfortunately, the above did not help and the issue persists. >> >> Digging a bit deeper I see that now the 'ctlr->queue' is empty but >> 'ctlr->busy' flag is set and this is causing the 'could not stop message >> queue' error. >> >> It seems that __spi_pump_messages() is getting called several times >> during boot when registering the spi-flash, then after the spi-flash has >> been registered, about a 1 sec later spi_pump_idle_teardown() is called >> (as expected), but exits because 'ctlr->running' is true. However, > > and this contradicts each other! > If there is a call to message pump, then we should process a message > and the counters should increase. > > If those counters do not increase then there is something strange. Actually no, this is nothing strange here at all. If you look Tegra has it's own 'transfer_one_message' handler (tegra_spi_transfer_one_message in drivers/spi/spi-tegra114.c) and this does not update any of the statistics. Looks like the statistics are not updated if a SPI controller has it own transfer_one_message hander which is a shame. > If we are called without anything to do then the pump should trigger a > tear down and stop. >> spi_pump_idle_teardown() is never called again and when we suspend we >> are stuck in the busy/running state. In this case should something be >> scheduling spi_pump_idle_teardown() again? Although even if it does I >> don't see where the busy flag would be cleared in this path? >> > > I am wondering where busy/running would be set in the first place > if there are no counters... Looks like running is set true on initialising the queue and busy is set true in __spi_pump_messages(). > Is it possible that the specific flash is not using the “normal” > spi_pump_message, but spi_controller_mem_ops operations? I definitely see __spi_pump_messages() being called which in turn calls tegra_spi_transfer_one_message(). > Maybe we are missing the teardown in that driver or something is > changing flags there. > > grepping a bit: > > I see spi_mem_access_start calling spi_flush_queue, which is calling > pump_message, which - if there is no transfer - will trigger a delayed > tear-down, while it maybe should not be doing that. > > Maybe spi_mem_access_end needs a call to spi_flush_queue as well? > > Unfortunately this is theoretical work and quite a lot of guesswork > without an actual device available for testing... Looking some more I think that the problem is ... 1. The controller is put into the 'running' state when spi_start_queue() is called (which itself is called during the registration of the spi controller). 2. After sending some messages to initialise the SPI flash, spi_pump_idle_teardown() is called but exits early because the controller is in the 'running' state and this prevents 'busy' from being cleared. 3. On entering suspend spi_stop_queue() is called but the controller is still in the 'busy' state and so fails to stop. So the problem I see is that 'busy' is never cleared and before it was. Furthermore, because 'busy' is never cleared, I see that the SPI controller is never pm-runtime suspended and cannot be suspended, and so this will have an impact on power. With your patch after booting I see ... $ cat /sys/class/spi_master/spi1/device/power/runtime_status active Reverting your patch, after booting I see ... $ cat /sys/class/spi_master/spi1/device/power/runtime_status suspended Does the device you are testing with support runtime-pm? If so what do you see? Cheers Jon -- nvpublic