On Sat, May 16, 2020 at 9:45 AM Lukas Wunner <lukas@xxxxxxxxx> wrote: > > On Sat, May 16, 2020 at 12:37:17AM +0300, Andy Shevchenko wrote: > > On Fri, May 15, 2020 at 7:41 PM Lukas Wunner <lukas@xxxxxxxxx> wrote: > > > On Fri, May 15, 2020 at 05:27:25PM +0100, Mark Brown wrote: > > > > On Fri, May 15, 2020 at 05:58:01PM +0200, Lukas Wunner wrote: > > > > > However since commit ffbbdd21329f ("spi: create a message queueing > > > > > infrastructure"), spi_destroy_queue() is executed before unbinding the > > > > > slaves. It sets ctlr->running = false, thereby preventing SPI bus > > > > > access and causing unbinding of slave devices to fail. > > > > > > > > Devices should basically never fail an unbind operation - if something > > > > went seriously wrong there's basically nothing that can be done in terms > > > > of error handling and keeping the device around isn't going to help. > > > > > > I guess the word "fail" in the commit message invites misinterpretations. > > > The driver does unbind from the slave device, but the physical device is > > > not left in a proper state. E.g. interrupts may still be generated by > > > the device because writing a register to disable them failed. > > > > I didn't check a patch, but I see a bug on kernel bugzilla against > > spi-pxa2xx because of this. It requires quite untrivial ->remove() in > > order to quiescent the DMA and other activities. > > Yes from a quick look at spi-pxa2xx.c it's immediately obvious that > the use of devm_spi_register_controller() is likewise completely wrong. > > The crucial thing to understand is that the SPI driver's ->remove() > hook is executed *before* any device-managed resources are released. > pxa2xx_spi_remove() disables the clock, frees the IRQ, releases DMA, > so the SPI controller is no longer usable even though it's still > registered! Somehow this incorrect order got cargo-culted to dozens > of drivers over the years. > > We use SPI-attached Ethernet chips and when the SPI driver's module > is unloaded, the Ethernet driver's ->ndo_stop() hook is executed to > bring down the interface. For this it needs to communicate with the > Ethernet chip, but it can't because the ->remove() hook has already been > executed and unbinding of the SPI slave happens afterwards, when the > SPI controller is unregistered via devres_release_all(). > > There's another issue in spi-pxa2xx.c: It acquires a runtime PM ref > even though the driver core already does that. > > Do you have a link to the spi-pxa2xx.c bugzilla? Are you able to > test patches? I can submit a patch but I can only compile-test it, Here you are: https://bugzilla.kernel.org/show_bug.cgi?id=206403. There also a link to my GH tree where I tried to clean up a bit. And yes, I know about atomic handling bug there, but it's another story. I was able to reproduce the bug once or twice, but submitter has a test case with reproducibility close to 100%. -- With Best Regards, Andy Shevchenko