Hi, > Am 25.04.2020 um 12:37 schrieb H. Nikolaus Schaller <hns@xxxxxxxxxxxxx>: > > >> Am 25.04.2020 um 12:29 schrieb H. Nikolaus Schaller <hns@xxxxxxxxxxxxx>: >> >> H >> The things start to get "fixed" when the hdq_isr >> jumps to 6 indicating >> >> OMAP_HDQ_INT_STATUS_RXCOMPLETE | OMAP_HDQ_INT_STATUS_TXCOMPLETE >> >> So I am getting more inclined to believe that it is >> not a power management issue but some piggybacked >> change to how interrupts are handled. >> Especially hdq_reset_irqstatus. >> >> So I will add a printk to hdq_reset_irqstatus >> to see what value it had before being reset. > > I now did check the log during boot and there is the > reverse situation. Initially it works but suddenly > hdq_isr becomes 6 and then trouble starts. > > So the key problem is that both, the RX and the TX > interrupts may be set and then, the code resets > everything to 0 and looses either one. > > I wonder if that is an issue by two processes reading > hdq in parallel. > > Another question is how independent command-writes + result-reads > are properly serialized and locked so that they don't overlap? I have reworked the way the spinlocks, setting and resetting of the hdq_irqstatus bits are done and now it works right from start of boot. Without any timeouts or delays. I am not exactly sure what went wrong, but it seems as if the read is already done when the write interrupt status bit is processed. Then, the old logic did wipe out both bits by hdq_reset_irqstatus() and the read code did timeout because it did not notice that the data had already been available. This may depend on other system activities so that it can explain why other tests didn't reveal it. omap_hdq_runtime_resume() and omap_hdq_runtime_suspend() also behave fine. Before I can post something I need to clean up my hacks and add similar fixes to omap_hdq_break() and omap_w1_triplet() where I hope that I don't break those... BR and thanks, Nikolaus