On Tue, May 04, 2021 at 03:54:00PM +0200, Frieder Schrempf wrote: > On 03.05.21 15:54, Andy Shevchenko wrote: > > On Mon, May 03, 2021 at 04:48:10PM +0300, Andy Shevchenko wrote: > > > On Mon, May 03, 2021 at 04:44:24PM +0300, Andy Shevchenko wrote: > > > > On Mon, May 03, 2021 at 03:11:40PM +0200, Frieder Schrempf wrote: > > > > > Hi, > > > > > > > > > > with kernel 5.10.x and 5.12.x I'm getting a null pointer dereference > > > > > exception from the mcp251x driver when I resume from sleep (see trace > > > > > below). > > > > > > > > > > As far as I can tell this was working fine with 5.4. As I currently don't > > > > > have the time to do further debugging/bisecting, for now I want to at least > > > > > report this here. > > > > > > > > > > Maybe there is someone around who could already give a wild guess for what > > > > > might cause this just by looking at the trace/code!? > > > > > > > > Does revert of c7299fea6769 ("spi: Fix spi device unregister flow") help? > > > > > > Other than that, bisecting will take not more than 3-4 iterations only: > > > % git log --oneline v5.4..v5.10.34 -- drivers/net/can/spi/mcp251x.c > > > 3292c4fc9ce2 can: mcp251x: fix support for half duplex SPI host controllers > > > e0e25001d088 can: mcp251x: add support for half duplex controllers > > > 74fa565b63dc can: mcp251x: Use readx_poll_timeout() helper > > > 2d52dabbef60 can: mcp251x: add GPIO support > > > cfc24a0aa7a1 can: mcp251x: sort include files alphabetically > > > df561f6688fe treewide: Use fallthrough pseudo-keyword > > > > > 8ce8c0abcba3 can: mcp251x: only reset hardware as required > > > > And only smoking gun by analyzing the code is the above. So, for the first I > > would simply check before that commit and immediately after (15-30 minutes of > > work). (I would do it myself if I had a hardware at hand...) > > Thanks for pointing that out. Indeed when I revert this commit it works fine > again. > > When I look at the change I see that queue_work(priv->wq, > &priv->restart_work) is called in two cases, when the interface is brought > up after resume and now also when the device is only powered up after resume > but the interface stays down. > > The latter is a problem if the device was never brought up before, as the > workqueue is only allocated and initialized in mcp251x_open(). > > To me it looks like a proper fix would be to just move the workqueue init to > the probe function to make sure it is available when resuming even if the > interface was never up before. > > I will try this and send a patch if it looks good. Sounds like a plan! -- With Best Regards, Andy Shevchenko