On 6/28/24 5:16 AM, Marc Kleine-Budde wrote: > On 28.06.2024 11:49:38, Oleksij Rempel wrote: >> It seems to be spi_mux specific. We have seen similar trace on other system >> with spi_mux. > > Here is the other backtrace from another imx8mp system with a completely > different workload. Both have in common that they use a spi_mux on the > spi-imx driver. > > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000dd0 > Mem abort info: > ESR = 0x0000000096000004 > EC = 0x25: DABT (current EL), IL = 32 bits > SET = 0, FnV = 0 > EA = 0, S1PTW = 0 > FSC = 0x04: level 0 translation fault > Data abort info: > ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 > CM = 0, WnR = 0, TnD = 0, TagAccess = 0 > GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 > user pgtable: 4k pages, 48-bit VAs, pgdp=0000000046760000 > [0000000000000dd0] pgd=0000000000000000, p4d=0000000000000000 > Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP > Modules linked in: can_raw can ti_ads7950 industrialio_triggered_buffer kfifo_buf spi_mux fsl_imx8_ddr_perf at24 flexcan caam can_dev error rtc_snvs imx8mm_thermal spi_imx capture_events_irq cfg80211 iio_trig_hrtimer industrialio_sw_trigger ind> > CPU: 3 PID: 177 Comm: spi5 Not tainted 6.9.0 #1 > Hardware name: xxxxxxxxxxxxxxxx (xxxxxxxxx) (DT) > pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) > pc : spi_res_release+0x24/0xb8 > lr : spi_async+0xac/0x118 > sp : ffff8000823fbcc0 > x29: ffff8000823fbcc0 x28: 0000000000000000 x27: 0000000000000000 > x26: ffff8000807bef88 x25: ffff80008115c008 x24: 0000000000000000 > x23: ffff8000826c3938 x22: 0000000000000000 x21: ffff0000076a9800 > x20: 0000000000000000 x19: 0000000000000dc8 x18: 0000000000000000 > x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffff88c0e760 > x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 > x11: ffff8000815a1f98 x10: ffff8000823fbb40 x9 : ffff8000807b8420 > x8 : ffff800081508000 x7 : 0000000000000004 x6 : 0000000003ce4c66 > x5 : 0000000001000000 x4 : 0000000000000000 x3 : 0000000001000000 > x2 : 0000000000000000 x1 : ffff8000826c38e0 x0 : ffff0000076a9800 > Call trace: > spi_res_release+0x24/0xb8 > spi_async+0xac/0x118 > spi_mux_transfer_one_message+0xb8/0xf0 [spi_mux] > __spi_pump_transfer_message+0x260/0x5d8 > __spi_pump_messages+0xdc/0x320 > spi_pump_messages+0x20/0x38 > kthread_worker_fn+0xdc/0x220 > kthread+0x118/0x128 > ret_from_fork+0x10/0x20 > Code: a90153f3 a90363f7 91016037 f9403033 (f9400674) > ---[ end trace 0000000000000000 ]--- > > regards, > Marc > Hi Oleksij and Marc, I'm supposed to be on vacation so I didn't look into this deeply yet but I can see what is happening here. spi_mux_transfer_one_message() is calling spi_async() which is calling __spi_optimize_message() on an already optimized message. Then it also calls __spi_unoptimize_message() which tries to release resources. But this fails because the spi-mux driver has swapped out the pointer to the device in the SPI message. This causes the wrong ctlr to be passed to spi_res_release(), causing the crash. I don't know if a proper fix could be quite so simple, but here is something you could try (untested): --- diff --git a/drivers/spi/spi-mux.c b/drivers/spi/spi-mux.c index 5d72e3d59df8..ec837e28183d 100644 --- a/drivers/spi/spi-mux.c +++ b/drivers/spi/spi-mux.c @@ -42,6 +42,7 @@ struct spi_mux_priv { void (*child_msg_complete)(void *context); void *child_msg_context; struct spi_device *child_msg_dev; + bool child_msg_pre_optimized; struct mux_control *mux; }; @@ -94,6 +95,7 @@ static void spi_mux_complete_cb(void *context) m->complete = priv->child_msg_complete; m->context = priv->child_msg_context; m->spi = priv->child_msg_dev; + m->pre_optimized = priv->child_msg_pre_optimized; spi_finalize_current_message(ctlr); mux_control_deselect(priv->mux); } @@ -116,10 +118,12 @@ static int spi_mux_transfer_one_message(struct spi_controller *ctlr, priv->child_msg_complete = m->complete; priv->child_msg_context = m->context; priv->child_msg_dev = m->spi; + priv->child_msg_pre_optimized = m->pre_optimized; m->complete = spi_mux_complete_cb; m->context = priv; m->spi = priv->spi; + m->pre_optimized = true; /* do the transfer */ return spi_async(priv->spi, m);