Hello, On Feb 05, 2024 at 11:12:54 +0100, Miquel Raynal wrote: > Hi Théo, > > > > > The fatal conclusion of this is a deadlock: we acquire a lock on each > > > > operation but while running the operation, we might want to runtime > > > > resume and acquire the same lock. > > > > > > > > Anyway, those helpers (spi_controller_{suspend,resume}) are aimed at > > > > system-wide suspend and resume and should NOT be called at runtime > > > > suspend & resume. > > > > > > > > Side note: the previous implementation had a second issue. It acquired a > > > > pointer to both `struct cqspi_st` and `struct spi_controller` using > > > > dev_get_drvdata(). Neither embed the other. This lead to memory Oops, I seem to have overlooked this. I think it should've been spi_controller_get_devdata() > > > > corruption that was being hidden inside the big cqspi->f_pdata array on > > > > my setup. It was working until I tried changing the array side to its > > > > theorical max of 4, which lead to the discovery of this gnarly bug. > > > > > > > > Fixes: 0578a6dbfe75 ("spi: spi-cadence-quadspi: add runtime pm support") > > > > Fixes: 2087e85bb66e ("spi: cadence-quadspi: fix suspend-resume implementations") Thanks for the fixes. > > > > > > Your commit log makes total sense but I believe the diff is gonna break > > > again the suspend to RAM operation. This is only my understanding > > > right after quickly going through the whole story, so maybe I'm > > > totally off topic. > > > > The current ->runtime_suspend() implementation would indeed (probably) > > work for suspend-to-RAM if it wasn't for the wrong pointers to cqspi > > and spi_controller (see side note from commit message). > > Yeah, this probably needs to be fixed aside. > > > I've not found a moment where `struct cqspi_st` embed `struct > > spi_controller` at its start, so I do not believe this has ever worked. I don't know how it worked either, but I had definitely tested and provided logs at the time of posting the series, https://lore.kernel.org/all/20230417091027.966146-1-d-gole@xxxxxx/ > > It might be the result of a mistake while porting a patch from a branch > > that included other changes. Hmm, could be, not entirely sure now. But I did test it and now don't know how it had worked with that wrong pointer now that I see that mistake. > > > > > What happened if I understand the two commits blamed above: > > > > > > - There were PM hooks. > > > - Someone turned them into runtime PM hooks (breaking regular > > > suspend/resume). > > > - Someone else added the "missing" suspend/resume logic inside the > > > runtime PM hooks to fix suspend and resume. > > > - You are removing this logic because it leads to deadlocks. > > > > > > There was likely a misconception of what is expected in both cases > > > (quick and small power savings vs. full power cycle/loosing the whole > > > configuration). The context was as follows, The upstream cqspi driver prior to this: https://lore.kernel.org/all/20230417091027.966146-1-d-gole@xxxxxx/ series had buggy suspend resume. That needed fixing hence I added the first patch that introduced the buggy pointer but somehow still ended up working after suspend resume. After that, I also wanted the driver to support runtime_pm. I thought that both system suspend and runtime pm would have similar requirements from a driver POV since the IP essentially would turn off and from it's view would need system suspend like suspend resume calls. > > > > > > I would propose instead to create two distinct set of functions: > > > - One for runtime PM > > > - One for suspend/resume > > > This way the runtime PM no longer deadlocks and people using > > > suspend/resume won't get affected? I don't know if your runtime hooks > > > *will* always be called during a suspend/resume. I hope so, which would > > > make the split quite easy and without any code duplication. > > > > That does indeed sound like the right approach. Runtime hooks can be > > called from suspend/resume if needs be. Runtime PM then gets disabled > > at the late stage. > > Would make sense indeed. Now that I look at it, perhaps it is best to have 2 seperate calls for runtime and system pm. > > > I do not believe currently system-wide suspend can be working. > > spi_controller_{suspend,resume} are being called with a bogus pointer. > > This makes me ask: should the system-wide suspend/resume part be > > addressed with this patch or a follow-up? It feels like a separate > > concern to me. > > Probably two patches, yes. Yes, I think it best that we add a proper system suspend and runtime pm support for this driver. Again, thanks for catching this bug and reporting a fix. I also have an SK-AM62 handy which uses this ospi controller so let me see if I can help test your patches with system and runtime pm as well whenever you do post them. -- Best regards, Dhruva Gole <d-gole@xxxxxx>