Re: [PATCH 1/5] spi: Fix controller unregister order

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, May 16, 2020 at 9:45 AM Lukas Wunner <lukas@xxxxxxxxx> wrote:
>
> On Sat, May 16, 2020 at 12:37:17AM +0300, Andy Shevchenko wrote:
> > On Fri, May 15, 2020 at 7:41 PM Lukas Wunner <lukas@xxxxxxxxx> wrote:
> > > On Fri, May 15, 2020 at 05:27:25PM +0100, Mark Brown wrote:
> > > > On Fri, May 15, 2020 at 05:58:01PM +0200, Lukas Wunner wrote:
> > > > > However since commit ffbbdd21329f ("spi: create a message queueing
> > > > > infrastructure"), spi_destroy_queue() is executed before unbinding the
> > > > > slaves.  It sets ctlr->running = false, thereby preventing SPI bus
> > > > > access and causing unbinding of slave devices to fail.
> > > >
> > > > Devices should basically never fail an unbind operation - if something
> > > > went seriously wrong there's basically nothing that can be done in terms
> > > > of error handling and keeping the device around isn't going to help.
> > >
> > > I guess the word "fail" in the commit message invites misinterpretations.
> > > The driver does unbind from the slave device, but the physical device is
> > > not left in a proper state.  E.g. interrupts may still be generated by
> > > the device because writing a register to disable them failed.
> >
> > I didn't check a patch, but I see a bug on kernel bugzilla against
> > spi-pxa2xx because of this. It requires quite untrivial ->remove() in
> > order to quiescent the DMA and other activities.
>
> Yes from a quick look at spi-pxa2xx.c it's immediately obvious that
> the use of devm_spi_register_controller() is likewise completely wrong.
>
> The crucial thing to understand is that the SPI driver's ->remove()
> hook is executed *before* any device-managed resources are released.
> pxa2xx_spi_remove() disables the clock, frees the IRQ, releases DMA,
> so the SPI controller is no longer usable even though it's still
> registered!  Somehow this incorrect order got cargo-culted to dozens
> of drivers over the years.
>
> We use SPI-attached Ethernet chips and when the SPI driver's module
> is unloaded, the Ethernet driver's ->ndo_stop() hook is executed to
> bring down the interface.  For this it needs to communicate with the
> Ethernet chip, but it can't because the ->remove() hook has already been
> executed and unbinding of the SPI slave happens afterwards, when the
> SPI controller is unregistered via devres_release_all().
>
> There's another issue in spi-pxa2xx.c:  It acquires a runtime PM ref
> even though the driver core already does that.
>
> Do you have a link to the spi-pxa2xx.c bugzilla?  Are you able to
> test patches?  I can submit a patch but I can only compile-test it,

Here you are: https://bugzilla.kernel.org/show_bug.cgi?id=206403.
There also a link to my GH tree where I tried to clean up a bit. And
yes, I know about atomic handling bug there, but it's another story.

I was able to reproduce the bug once or twice, but submitter has a
test case with reproducibility close to 100%.

-- 
With Best Regards,
Andy Shevchenko



[Index of Archives]     [Linux Kernel]     [Linux ARM (vger)]     [Linux ARM MSM]     [Linux Omap]     [Linux Arm]     [Linux Tegra]     [Fedora ARM]     [Linux for Samsung SOC]     [eCos]     [Linux Fastboot]     [Gcc Help]     [Git]     [DCCP]     [IETF Announce]     [Security]     [Linux MIPS]     [Yosemite Campsites]

  Powered by Linux